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ABSTRACT 

We present a new role system for specifying changing refer- 
encing relationships of heap objects. The role of an object 
depends, in large part, on its aliasing relationships with other 
objects, with the role of each object changing as its aliasing 
relationships change. Roles therefore capture important ob- 
ject and data structure properties and provide useful infor- 
mation about how the actions of the program interact with 
these properties. Our role system enables the programmer 
to specify the legal aliasing relationships that define the set 
of roles that objects may play, the roles of procedure param- 
eters and object fields, and the role changes that procedures 
perform while manipulating objects. We present an inter- 
procedural, compositional, and context-sensitive role analy- 
sis algorithm that verifies that a program respects the role 
constraints. 
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Types capture important properties of the objects that pro- 
grams manipulate, increasing both the safety and readability 
of the program. Traditional type systems capture properties 
(such as the format of data items stored in the fields of the 
object) that are invariant over the lifetime of the object. But 
in many cases, properties that do change are as important 
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as properties that do not. Recognizing the benefit of cap- 
turing these changes, researchers have deveioped systems in 
which the type of the object changes as the values stored in 
its fields change or as the program invokes operations on the 
object mUSIITniliZIEHllllllllIini- These systems integrate 
the concept of changing object states into the type system. 

The fundamental idea in this paper is that the state of 
each object also depends on the data structures in which it 
participates. Our type system therefore captures the refer- 
encing relationships that determine this data structure par- 
ticipation. As objects move between data structures, their 
types change to reflect their changing relationships with 
other objects. Our system uses roles to formalize the con- 
cept of a type that depends on the referencing relationships. 
Each role declaration provides complete aliasing information 
for each object that plays that role — in addition to specify- 
ing roles for the fields of the object, the role declaration also 
identifies the complete set of references in the heap that refer 
to the object. In this way roles generalize linear type sys- 
tems |45l 1^ 1^ by allowing multiple aliases to be statically 
tracked, and extend alias types |42l I4()j with the ability to 
specify roles of objects that are the source of aliases. 

This approach attacks a key difficulty associated with 
state-based type systems: the need to ensure that any state 
change performed using one alias is correctly reflected in the 
declared types of the other aliases. Because each object's 
role identifies all of its heap aliases, the analysis can verify 
the correctness of the role information at all remaining or 
new heap aliases after an operation changes the referencing 
relationships. 

Roles capture important object and data structure prop- 
erties, improving both the safety and transparency of the 
program. For example, roles allow the programmer to ex- 
press data structure consistency properties (with the proper- 
ties verified by the role analysis) , to improve the precision of 
procedure interface specifications (by allowing the program- 
mer to specify the role of each parameter) , to express precise 
referencing and interaction behaviors between objects (by 
specifying verified roles for object fields and aliases), and to 
express constraints on the coordinated movements of objects 
between data structures (by using the aliasing information in 
role definitions to identify legal data structure membership 
combinations). Roles may also aid program optimization by 
providing precise aliasing information. 

This paper makes the following contributions: 

• Role Concept: The concept that the state of an ob- 
ject depends on its referencing relationships; specifi- 
cally, that objects with different heap aliases should be 
regarded as having different states. 

• Role Definition Language: It presents a language 
for defining roles. The programmer can use this lan- 
guage to express data structure invariants and proper- 
ties such as data structure participation. 

• Programming Model: It presents a set of role con- 
sistency rules. These rules give a programming model 
for changing the role of an object and the circumstances 
under which roles can be temporarily violated. 

• Procedure Interface Specification Language: It 

presents a language for specifying the initial context 
and effects of each procedure. The effects summarize 
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Figure 1: Role Reference Diagram for Scheduler 



the actions of the procedure in terms of the references 
it changes and the regions of the heap that it affects. 

• Role Analysis Algorithm: It presents an algorithm 
for verifying that the program respects the constraints 
given by a set of role deflnitions and procedure spec- 
iflcations. The algorithm uses a data-flow analysis to 
infer intermediate referencing relationships between ob- 
jects, allowing the programmer to focus on role changes 
and procedure interfaces. 

2 Example 

Figure presents a role reference diagram for a process 
scheduler. Each box in the diagram denotes a disjoint set of 
objects of a given role. The labelled arrows between boxes 
indicate possible references between the objects in each set. 
As the diagram indicates, the scheduler maintains a list of 
live processes. A live process can be either running or sleep- 
ing. The running processes form a doubly- linked list, while 
sleeping processes form a binary tree. Both kinds of pro- 
cesses have proc references from the live list nodes LiveList. 
Header objects RunningHeader and SleepingTree simplify 
operations on the data structures that store the process ob- 
jects. 

As Figure shows, data structure participation deter- 
mines the conceptual state of each object. In our exam- 
ple, processes that participate in the sleeping process tree 
data structure are classified as sleeping processes, while pro- 
cesses that participate in the running process list data struc- 
ture are classified as running processes. Moreover, move- 
ments between data structures correspond to conceptual 
state changes — when a process stops sleeping and starts run- 
ning, it moves from the sleeping process tree to the running 
process list. 

2.1 Role Definitions 

Figure 13 presents the role definitions for the objects in our 
example.^ Each role definition specifies the constraints that 
an object must satisfy to play the role. Field constraints 



^In general, each role definition would specify the static 
class of objects that can play that role. To simplify the 
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specify the roles of the objects to which the fields refer, while 
slot constraints identify the number and kind of aliases of the 
object. 

role LiveHeader { 

fields next : LiveList I null; 

} 

role LiveList { 

fields next : LiveList I null, 

proc : RunningProc I SleepingProc ; 
slots LiveList .next I LiveHeader .next ; 
acyclic next ; 

} 

role RunningHeader { 

fields next : RunningProc I RunningHeader, 

prev : RunningProc I RunningHeader; 
slots RunningHeader . next I RunningProc . next , 

RunningHeader . prev I RunningProc .prev; 
identities next. prev, prev. next; 

} 

role RunningProc ■[ 

fields next : RunningProc I RunningHeader, 

prev : RunningProc I RunningHeader; 
slots RunningHeader . next I RunningProc . next , 

RunningHeader . prev I RunningProc .prev, 

LiveList .proc ; 
identities next. prev, prev. next; 

} 

role SleepingTree { 

fields root : SleepingProc I null, 
acyclic left, right; 

} 

role SleepingProc { 

fields left : SleepingProc I null, 
right : SleepingProc I null; 
slots SleepingProc . left I SleepingProc. right I 
SleepingTree . root ; 
LiveList .proc; 
acyclic left, right; 

} 

role DeadProc { } 

Figure 2: Role Definitions for a Scheduler 

Role definitions may also contain two additional kinds of 
constraints: identity constraints, which specify paths that 
lead back to the object, and acyclicity constraints, which 
specify paths with no cycles. In our example, the identity 
constraint next. prev in the RunningProc role specifies the 
cyclic doubly- linked list constraint that following the next, 
then prev fields always leads back to the initial object. The 
acyclic constraint left , right in the SleepingProc role 
specifies that there are no cycles in the heap involving only 
left and right edges. On the other hand, the list of run- 
ning processes must be cyclic because its nodes can never 
point to null. 

The slot constraints specify the complete set of heap 
aliases for the object. In our example, this implies that no 
process can be simultaneously running and sleeping. 

presentation, we assume that all objects are instances of a 
single class with a set of fields F. 



In general, roles can capture data structure consistency 
properties such as disjointness and can prevent representa- 
tion exposure As a data structure description language, 
roles can naturally specify trees with additional pointers. 
Roles can also approximate non-tree data structures like 
sparse matrices. Because most role constraints are local, 
it is possible to inductively infer them from data structure 
instances. 

2.2 Roles and Procedure Interfaces 

Procedures specify the initial and final roles of their parame- 
ters. The suspend procedure in Figure |3| for example, takes 
two parameters: an object with role RunningProc p, and 
the SleepingTree s. The procedure changes the role of the 
object referenced by p to SleepingProc whereas the object 
referenced by s retains its original role. To perform the role 
change, the procedure removes p from its RunningList data 
structure and inserts it into the SleepingTree data struc- 
ture s. If the procedure fails to perform the insertions or 
deletions correctly, for instance by leaving an object in both 
structures, the role analysis will report an error. 

procedure suspend(p : RunningProc -» SleepingProc, 
s : SleepingTree) 

local pp, pn, r; 
{ 

pp = p . prev ; pn = p . next ; 
r = s.root; 

p . prev = null ; p . next = null ; 
pp. next = pn; pn.prev = pp; 
s.root = p; p. left = r; 
setRoleCp : SleepingProc); 

} 

Figure 3: Suspend Procedure 



3 Abstract Syntax and Semantics of Roles 

In this section, we precisely define what it means for a given 
heap to satisfy a set of role definitions. In subsequent sec- 
tions we will use this definition as a starting point for a 
programming model and role analysis. 

3.1 Heap Representation 

We represent a concrete program heap as a finite directed 
graph He with nodes(_H"c) representing objects of the heap 
and labelled edges representing heap references. A graph 
edge (oi,/, 02) £ He denotes a reference with field name / 
from object oi to object 02 • To simplify the presentation, we 
fix a global set of fields F and assume that all objects have 
all fields in F. We do not consider subtyping or dynamic 
dispatch in this paper. 

3.2 Role Representation 

Let R denote the set of roles used in role definitions, nullij be 
a special symbol always denoting a null object nullc, and let 
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Ro = -RU{null_R}. We represent each role as the conjunction 
of the following four kinds of constraints: 

• Fields: For every field name f £ F we introduce a 
function field f : R ^ 2^° denoting the set of roles 
that objects of role r £ R can reference through field 
/. A field / of role r can be null if and only if 
nullfl G field/(r). The explicit use of nullij and the pos- 
sibility to specify a set of alternative roles for every field 
allows roles to express both may and must referencing 
relationships. 

• Slots: Every role r has slotno(r) slots. A slot slotfc(r) of 
role r € 7i is a subset of RxF. Let o be an object of role 
r and o' an object of role r' . A reference (o', /, o) € He 
can fill a slot k of object o if and only if {r' , /) £ slot^ (r) . 
An object with role r must have each of its slots filled 
by exactly one reference. 

• Identities: Every role r £ R has a set of identities(r-) C 
F X F. Identities are pairs of fields (/, g) such that 
following reference / on object o and then returning on 
reference g leads back to o. 

• Acyclicities: Every role r £ R has a set acyclic(r) C F 
of fields along which cycles are forbidden. 

3.3 Role Semantics 

We define the semantics of roles as a conjunction of invari- 
ants associated with role definitions. A concrete role assign- 
ment is a map pc : nodes(-f/c) — > -Ro such that pc(nullc) = 
nullfl. 

Definition 1 Given a set of role definitions, we say that 
heap He is role consistent iff there exists a role assignment 
pc : noAes{Hc) Ro such that for every o £ nodes(Hc) the 
predicate locallyConsistent(o, T^c, Pc) is satisfied. We call any 
such role assignment pc a valid role assignment. 

The predicate locallyConsistent(o, He, pc) formalizes the con- 
straints associated with role definitions. 

Definition 2 locallyConsistent(o, _ffc, Pc) iff all of the fol- 
lowing conditions are met. Let r — pc{o). 

1) For every field f £ F and {o,f,o') £ He, pcio') £ 
field/ (r). 

2) Let {{oufi),...,{ok,fk)} = {{o',/> 1 (o',/,o> £ He} 
be the set of all aliases of object o. Then k = 
slotno(r) and there exists some permutation p of the 
set {I, . . . , fc} such that {pc{oi), fi) £ slotp. (r) for all i. 

3) If{o,f,o') £ He, (o',g,o") £ He, and 
{f,g) £ identities(r), then o = o" . 

4) It is not the case that graph He contains a cycle 
oi, fi, . . . ,Os, fs,oi where o\ = a and 
fi,...,fs£ acyclic(r) 

Note that a role consistent heap may have multiple valid 
role assignments pc. However, in each of these role assign- 
ments, every object o is assigned exactly one role Pc(o). 
The existence of a role assignment pc with the property 
Pc(oi) 7^ pc(o2) thus implies oi 7^ 02. This is just one of 
the ways in which roles make aliasing more predictable. 



4 Role Properties 

Roles capture important properties of the objects and pro- 
vide useful information about how the actions of the program 
affect those properties. 

• Consistency Properties: Roles can ensure that the 
program respects application-level data structure con- 
sistency properties. The roles in our process scheduler, 
for example, ensure that a process cannot be simulta- 
neously sleeping and running. 

• Interface Changes: In many cases, the interface of an 
object changes as its referencing relationships change. 
In our process scheduler, for example, only running pro- 
cesses can be suspended. Because procedures declare 
the roles of their parameters, the role system can en- 
sure that the program uses objects correctly even as the 
object's interface changes. 

• Multiple Uses: Code factoring minimizes code dupli- 
cation by producing general-purpose classes (such as 
the Java Vector and Hashtable classes) that can be 
used in a variety of contexts. But this practice ob- 
scures the different purposes that different instances of 
these classes serve in the computation. Because each in- 
stance's purpose is usually reflected in its relationships 
with other objects, roles can often recapture these dis- 
tinctions. 

• Correlated Relationships: In many cases, groups 
of objects cooperate to implement a piece of function- 
ality. Standard type declarations provide some infor- 
mation about these collaborations by identifying the 
points-to relationships between related objects at the 
granularity of classes. But roles can capture a much 
more precise notion of cooperation, because they track 
correlated state changes of related objects. 

Programmers can use roles for specifying the membership 
of objects in data structures and the structural invariants 
of data structures. In both cases, the slot constraints are 
essential. 

When used to describe membership of an object in a data 
structure, slots specify the source of the alias from a data 
structure node that stores the object. By assigning different 
sets of roles to data structures used at different program 
points, it is possible to distinguish nodes stored in different 
data structure instances. As an object moves between data 
structures, the role of the object changes appropriately to 
reflect the new source of the alias. 

When describing nodes of data structures, slot constraints 
specify the aliasing constraints of nodes; this is enough to 
precisely describe a variety of data structures and approxi- 
mate many others. Prooertv llBl below shows how to identify 
trees in role definitions even if tree nodes have additional 
ahases from other sets of nodes. It is also possible to define 
nodes which make up a compound data structure linked via 
disjoint sets of fields, such as threaded trees, sparse matrices 
and skip lists. 

Example 3 The following role definitions specify a sparse 
matrix of width and height at least 3. These definitions can 
be easily constructed from a sketch of a sparse matrix, as in 
Figure 0] 
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Figure 4: Roles of Nodes of a Sparse Matrix 



A4; 



A5: 



role Al { 

fields right : A2, down 
acyclic right, down; 

} 

role A2 { 

fields right : A2 I A3, down 
slots Al. right I A2. right; 
acyclic right, down; 

} 

role A3 { 

fields down : A6; 
slots A2. right; 
acyclic right, down; 

} 

role A4 { 

fields right : A5, down : A4 
slots Al.down I A4.down; 
acyclic right, down; 

} 

role A5 { 

fields right : 

slots A4. right 

acyclic right, 
} 

role A6 { 

fields down : A6 I A9; 

slots A5. right, A3. down I A6.down; 

acyclic right, down; 

} 

role A7 { 

fields right : A8; 
slots A4.down; 
acyclic right, down; 

} 

role A8 { 

fields right : A8 I A9; 

slots A7. right I A8. right, AS.down; 

acyclic right, down; 

} 

role A9 { 

slots AS. right, A6.down; 
acyclic right, down; 



I A7; 



A5 I A6, down : A5 I A8; 
I A5. right, A2.down I AS.down; 
down; 



Figure 5: Sketch of a Two-Level Skip List 



Example 4 We next give role definitions for a two-level 
skip list l^ffi) sketched in Figure |^ 

role SkipList { 

fields one : OneNode I TwoNode I null; 
two : TwoNode I null; 

> 

role OneNode ■[ 

fields one : OneNode I TwoNode I null; 
two : null ; 

slots OneNode . one I TwoNode . one I SkipList . one ; 
acyclic one, two; 

> 

role TwoNode { 

fields one : OneNode I TwoNode I null; 

two : TwoNode I null ; 
slots OneNode . one I TwoNode . one I SkipList . one , 

TwoNode. two I SkipList .two; 
acyclic one, two; 

> 

4.1 Formal Properties of Roles 

In this section we identify some of the invariants expressible 
using sets of mutually recursive role definitions. A further 
study of role properties can be found in 

The following properties show some of the ways role spec- 
ifications make object aliasing more predictable. They are 
an immediate consequence of the semantics of roles. 

Property 5 (Role Disjointness) 

If there exists a valid role assignment pc for He such that 
p(oi) / p(o2), then oi / 02. 

The previous property gives a simple criterion for showing 
that objects oi and 02 are unaliased: find a valid role as- 
signment which assigns difi'erent roles to oi and 02. This 
use of roles generalizes the use of static types for pointer 
analysis | 12| . Since roles create a finer partition of objects 
than a typical static type system, their potential for proving 
absence of aliasing is even larger. 

Property 6 (Disjointness Propagation) 
If (oi , /, 02), (03, g, 04) £ He, oi 7^ 03, and there exists a valid 
role assignment pc for He such that Pc(o2) = Pc{o4) — r but 
field/ (r) nfieldg(r) = 0, then 02 7^ 04. 

Property 7 (Generalized Uniqueness) 

U (oil /i 02), (03, 04) G He, oi 7^ 03, and there exists a 
role assignment pe such that pe{o2) = pc{o4) — r, but there 
are no indices i ^ j such that {pe{o\),f) £ sloti(r) and 
{pc{o2),g} G slotj(r) then 02 7^ 04. 
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A special case of Property |7| occurs when slotno(r) = 1; this 
constrains all references to objects of role r to be unique. 

Role definitions induce a role reference diagram RRD 
which captures some, but not all, role constraints. 

Definition 8 (Role Reference Diagram) 
Given a set of definitions of roles R, a role reference diagram 
RRD is is a directed graph with nodes Rq and labelled edges 
defined by 

RRD = {(r,/,r') I r' e field/(r) and 3i {r, f) e s\oU{r')} 
U{(r,/, nulljj) I nullif, G field/(r)} 

Each role reference diagram is a refinement of the corre- 
sponding class diagram in a statically typed language, be- 
cause it partitions classes into multiple roles according to 
their referencing relationships. The sets p~^{r) of objects 
with role r change during program execution, reflecting the 
changing referencing relationships of objects. 

Role definitions give more information than a role refer- 
ence diagram. Slot constraints specify not only that objects 
of role ri can reference objects of role r2 along field /, but 
also give cardinalities on the number of references from other 
objects. In addition, role definitions include identity and 
acyclicity constraints, which are not present in role refer- 
ence diagrams. 

Property 9 Let pc be any valid role assignment. Define 

G={{Pc{0l),f,Pc{02)) I (0l,/,O2) eHc} 

Then G is a subgraph of RRD. 

It follows from Property |^ that roles give an approximation 
of may-reachability among heap objects. 

Property 10 (May Reachability) 

If there is a valid role assignment pc : nodes(iifc) Ro such 
that Pc(oi) 7^ Pc(o2) where 01,02 G nodes(_H"c) and there is 
no path from pc(oi) to pc(o2) in the role reference diagram 
RRD, then there is no path from o\ to 02 in He- 

The next property shows the advantage of explicitly speci- 
fying null references in role definitions. While the ability to 
specify acyclicity is provided by the acyclic constraint, it 
is also possible to indirectly specify must-cyclicity. 

Property 11 (Must Cyclicity) 

Let _Fo C _F and Rcyc R be a set of nodes in the role ref- 
erence diagram RRD such that for every node r G -Rcyc, if 
{r,f,r') G RRD then r' G -Rcyc- If Pc is a valid role assign- 
ment for He, then every object oi G He with Pc(oi) G -Rcyc 
is a member of a cycle in He with edges from Fo . 

The following property shows that roles can specify a form 
of must-reachability among the sets of objects with the same 
role. 

Property 12 (Downstream Path Termination) 
Assume that for some set of fields Fo F there are sets of 
nodes -Rinter C R, Rfmni C Ro of the role reference diagram 
RRD such that for every node r G -Rinter.' 

1. Fo C acyclic(r) 

2. if (r, /, r'> G RRD for f e Fo, then r' G -R, NTER U -RfINAL 



Let pc be a valid role assignment for He- Then every path in 
He starting from an object oi with role pc{oi) G -Rinter and 
containing only edges labelled with Fo is a prefix of a path 
that terminates at some object 02 with pc{02) G -Rfinal. 

Property 13 (Upstream Path Termination) 
Assume that for some set of fields -Fo C F there are sets of 
nodes -Rinter C R, 7?init C Ro of the role reference diagram 
RRD such that for every node r G -Rinter- 

1. Fo C acyclic(r) 

2. if (r', /, r) G RRD for f G -Fo, then r G -Rinter U Rm-j 

Let pc be a valid role assignment for He- Then every path 
in He terminating at an object 02 with Pc(o2) G -Rinter and 
containing only edges labelled with Fo is a suffix of a path 
which started at some object o\, where pc(oi) G -Rinit- 

The next two properties guarantee reachability properties 
by which there must exist at least one path in the heap, 
rather than stating properties of all paths as in Properties 
iniandini 

Property 14 (Downstream Must Reachability) 
Assume that for some set of fields Fo ^ F there are sets of 
roles -Rinter C R, -Rfinal C -Ro of the role reference diagram 
RRD such that for every node r G -Rinter.' 

1. Fo C acyclic(r) 

2. there exists f G -Fo such that field f (r) C -Rinter U -Rfinal 

Let Pc be a valid role assignment for He- Then for every 
object oi with pc{oi) G -Rinter there is a path in He with edges 
from Fo from oi to some object 02 where pc{o2) G -Rfinal- 

Property 15 (Upstream Must Reachability) 
Assume that for some set of fields Fo C F there are sets 
of nodes -Rinter C -R, -Rinit R of the role reference diagram 
RRD such that for every node r G -Rinter-' 

1. Fo C acyclic(r) 

2. there exists k such that slotfc(r) C (-Rinter U -Rinit) X F 

Let pe be a valid role assignment for He- Then for every 
object 02 with pe{o2) G -Rinter there is a path in He from 
some object Oi with Pc(oi) G -Rinit to the object 02- 

Trees are a class of data structures especially suited for 
static analysis. Roles can express graphs that are not trees, 
but it is useful to identify trees as certain sets of mutually 
recursive role definitions. 

Property 16 (Treeness) 

Let -Rtree R be a set of roles and -Fo C F set of fields such 
that for every r G -Rtree 

1. Fo C acyclic(r) 

2. \{i I slot^r) n (R^TREE X Fo) / 0}| < 1 

Let pe be a valid role assignment for He and S C 
{(ni,/,n2) I (ni,/,n2) G He, p{ni), p{n2) G -Rtree, / G Fq}. 
Then S is a set of trees. 
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5 A Programming Model 

In this section we define what it means for an execution of 
a program to respect the role constraints. This definition 
is complicated by the need to allow the program to tem- 
porarily violate the role constraints during data structure 
manipulations. Our approach is to let the program violate 
the constraints for objects referenced by local variables or 
parameters, but require all other objects to satisfy the con- 
straints. 

We first present a simple imperative language with dy- 
namic object allocation and give its operational semantics. 
We then specify additional statement preconditions that en- 
force the role consistency requirements. 

5.1 A Simple Imperative Language 

Our core language contains, as basic statements. Load 
(x=y.f), Store (x.f=y). Copy (x=y), and New (x=new). All 
variables are references to objects in the global heap and all 
assignments are reference assignments. We use an elemen- 
tary test statement combined with nondeterministic choice 
and iteration to express if and while statement, using the 
usual translation 1221 . We represent the control flow of 
programs using control- flow graphs. 

A program is a collection of procedures proc £ Proc. Pro- 
cedures change the global heap but do not return values. 
Every procedure proc has a list of parameters pa ram (proc) = 
{paramj(proc)}i and a list of local variables local(proc). We 
use var(proc) to denote param(proc) U local(proc). A proce- 
dure definition specifies the initial role preRj.(proc) and the 
final role postRj,(proc) for every parameter paramj,(proc). We 
use proCj for indices j £ M to denote activation records of 
procedure proc. We further assume that there are no modifi- 
cations of parameter variables so every parameter references 
the same object throughout the lifetime of procedure acti- 
vation. 

Example 17 The following kill procedure removes a pro- 
cess from both the doubly linked list of running processes 
and the list of all active processes. This is indicated by the 
transition from RunningProc to DeadProc. 

procedure kilKp : RunningProc -» DeadProc, 

1 : LiveHeader) 
local prev, current, cp, nxt , Ip, In; 
{ 

/ / f ind ' p ' in ' 1 ' 

prev = 1 ; current = 1 . next ; 

cp = current .proc; 

while (cp != p) { 

prev = current ; 

current = current . next ; 

cp = current .proc; 

> 

// remove 'current' and 'p' from active list 

nxt = current .next ; 

prev . next = nxt ; current . 

current. proc = null; 

setRole (current : IsolatedCell) ; 

// remove 'p' from running list 

Ip = p. prev; In = p. next; 



p. prev = null; p. next = null; 
Ip.next = In; In. prev = Ip; 
setRole (p : DeadProc); 

> 



5.2 Operational Semantics 

In this section we give the operational semantics for our lan- 
guage. We focus on the first three columns in Figures |S| and 
|7| the safety conditions in the fourth column are detailed in 
Section ICTl 

Figure |S| gives the small-step operational semantics for 
the basic statements. We use A til B to denote the union 
Au B where the sets A and B are disjoint. The program 
state consists of the stack s and the concrete heap He- The 
stack s is a sequence of pairs p@proCj G x (Proc x TV), where 
p £ A'^cfg(pi'oc) is a program point, and proc^ G Proc x A/" 
is an activation record of procedure proc. Program points 
p G A'^cFG(proc) are nodes of the control-fiow graphs. There 
is one control-fiow graph for every procedure proc. An edge 
of the control-flow graph {p,p') G i?cFG(pi'oc) indicates that 
control may transfer from point p to point p' . We write 
p : Stat to state that program point p contains a statement 
Stat. The control flow graph of each procedure contains spe- 
cial program points entry and exit indicating procedure en- 
try and exit, with no statements associated with them. We 
assume that all conditions are of the form x==y or ! (x==y) 
where x and y are either variables or a special constant null 
which always points to the nullc object. 

The concrete heap is either an error heap errorc or a non- 
error heap. A non-error heap He C NxFx A''U((Proc x A/") x 
V X N) is a directed graph with labelled edges, where nodes 
represent objects and procedure activation records, whereas 
edges represent heap references and local variables. An edge 
(oi, /, 02) (z N X F X N denotes a reference from object oi 
to object 02 via fleld f £ F. An edge {proCj,x,o) G He 
means that local variable x in activation record proc^ points 
to object o. 

A load statement x=y.f makes the variable x point to 
node Of, which is referenced by the f fleld of object Oy, 
which is in turn referenced by variable y. A store statement 
x.f=y replaces the reference along fleld f in object Ox by 
a reference to object Oy that is referenced by y. The copy 
statement x=y copies a reference to object Oy into variable 
X. The statement x=new creates a new object On with all 
fields initially referencing nullc, and makes x point to o„. 
The statement test(c) allows execution to proceed only if 
condition c is satisfied. 

Figure describes the semantics of procedure calls. Pro- 
cedure call pushes new activation record onto stack, inserts 
it into the heap, and initializes the parameters. Procedure 
entry initializes local variables. Procedure exit removes the 
activation record from the heap and the stack. 



5.3 Onstage and Offstage Objects 

At every program point the set of all objects of heap He can 
be partitioned into: 
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Statement 


Transition 


Constraints 


Role Consistency 






X, Y G local(proc), 




p : x=y . f 


(p@ proc; ; s , i?c W { ( proCi , x , Ox > }> — > 
(p'@proc,;s,//^) 


{proq,y,Oj,), (Oj,,f,o/) € iic, 
{p,p'} € i;cFG(proc), 

H'c = HcW {proc-,x, o/} 


accessible(o/, proq, i/o), 
con(/f^,ofFstage(/f^)) 






x, y G local(proc), 




p : x.f=y 


(p@nroc : A',i^r ttJ Ko^, f, Of)|) — 

(p'Qproc,: ,s. if') 


(proc . x. Or), (proc . v, o,,) G iir, 
(p.//) e fcFcfproc), 
ii,) = ii, ±1 {io,,f. 0,,)} 


Of G onstage(iic, pfoCj) 

con(ii,'.offstage(ii,'.)) 






X G local(proc), 




P ■■ x=y 


(p@proq; s, iic W {(proq, X, Oo,)}) — > 
{p'@proc,-,s.H'e) 


y G var(proc), 
{proq,y,Oy) G H^, 

{p.p') G fcFcfproc), 

ii,'. = ii,; W { (proc, . X, Oy)} 


con(/f^,ofFstage(/f^)) 


p : x=new 


(p@proc,; s, Tic W {(proq, x, Oo,)}) — » 
(p'@proq;s,ii^) 


X G local(proc), 
On fresh, 
(p,p') G £;cFG(proc), 
H'^ = ttl{(proq,x,o„)} ttl nulls, 
nulls = {on} X F X {null} 


con(/f^,ofFstage(/f^)) 


p : test(c) 


(p@proc^;s,iic) — > 
(p'@proc^;s,iic) 


satisfiedc(c, proc^, He), 
{p,p') G ScFG(proc) 


con(/fc, ofFstage(/fc)) 



satisfiedc(x==y, proc., iic) iflt {o | (proc-,x, o) G iic} = {o | (proCj,y, o) G He} 
satisfiedc( ! (x==y) , proc^. He) iff not satisfiedc(x==y, proc^, iic) 



accessible(o, proCj, iic) := (3p G param(proc) : (proCj,p,o) G iic) 

or not (BproCj 3v G var(proc') : (proc^-,w,o) G iic) 



Figure 6: Semantics of Basic Statements 



Statement 


Transition 


Constraints 


Role Consistency 


entry : _ 


(p@proCi;s,iic) — > 
(p'@proq;s,iic W nulls) 


nulls = {(proc-,u, nullc) 
V G local(proc), 
{p,p') € i;cFG(proc) 


con(iic, offstage(iic)) 


p : proc'{xk)k 


(p@proCi;s,iic) — > 
(entry@proCj;p'@proCi; s, H'^) 


j fresh in pQproci; s, 
{p,p') G -BcFG(proc), 
Ok ■■ {proCi,Xk,Ok) G He, 

H'e = He^ {{proc'j, Pk,Ok)}k, 

Vfc pk = paramj.(proc') 


conW(ra, iic, S"), 
ra = {{ofc,preRfe(proc'))}fe, 
S = offstage(iic) U {ok}k 


exit : _ 


(p@proCi;s,Hc) — > 
{s,Hc\AF) 


AF = {(proCi,-!;,n) | 
{proCi,v,n) € He} 


conW(ra, iic, S), 
ra = {(parndfe(procJ,postRfe(proc))}fc, 
S = offstage(iic) U 

{o 1 (proCi,t;,o) G iic} 



parndj,(proCj) = o where (proCj, paramj.(proc), o) G He 



Figure 7: Semantics of Procedure Call 



1. onstage objects (onstage(ffc)) referenced by a local 
variable or parameter of some activation frame; 

onstage(Hc, proCj)~{o | 3x G var(proc) 
(proCj, X, o) e He} 
onstage(f/c):= IJ onstage(//c, procj 
proc. 

2. offstage objects (ofFstage(i/c)) unreferenced by local 
or parameter variables. 

offstage(_ffc) := nodes(_H'c) \ onstage(_ffc) 

Onstage objects need not have correct roles. Offstage objects 
must have correct roles assuming some role assignment for 
onstage objects; the exception is that acyclicity constraints 
for offstage objects can be violated due to cycles that pass 
through the onstage objects. 

Definition 18 Given a set of role definitions and a set of 
objects Sc. CI nodes(S'c), we say that heap He is role con- 
sistent for Sc, and we write con{Hc, Sc) , iff there exists a 
role assignment pc : nodes(ffc) Ro such that the predi- 
cate locallyConsistent(o, _ffc, Pc Sc) is satisfied for every ob- 
ject o £ Sc- 

We define locallyConsistent(o, ii"c, Pc, Sc) to generalize the 
locallyConsistent(o, _ffc, Pc) predicate, weakening the acyclic- 
ity condition. 

Definition 19 locallyConsistent(o, He, Pc, Sc) holds iff con- 
ditions 1), 2), and 3) of Definitions^ are satisfied and the 
following condition holds: 

4 ') It is not the case that graph He contains a cycle 
oi, fi, . . . ,Os, fsjOi such that 
01 — o, /i, . . . , fa £ acyclic(r), and 
additionally oi, . . . ,Os (H Sc- 

Here Sc is the set of onstage objects that are not allowed 
to create a cycle; objects in nodes(_H"c) \ Sc are exempt from 
the acyclicity condition. The locallyConsistent(o, He, pc, Sc) 
and cor){Hc, Sc) predicates are monotonic in Sc, so a larger 
Sc implies a stronger invariant. For Sc ~ nodes(iifc), consis- 
tency for Sc is equivalent with heap consistency from Defini- 
tion Note that the role assignment pc specifies roles even 
for objects o £ nodes(_ffc) \ Sc- This is because the role of 
o may influence the role consistency of objects in Sc which 
are adjacent to o. 

At procedure calls, the role declarations for parameters 
restrict the set of potential role assignments. We therefore 
generalize con{Hc, Sc) to conW(ra, He, S'c), which restricts 
the set of role assignments pc considered for heap consis- 
tency. 

Definition 20 Given a set of role definitions, a heap He, a 
set Sc C nodes(Hc), and a partial role assignment ra (- Sc 
R, we say that the heap He is consistent with ra for Sc, and 
write con\N{ra, He, Sc), iff there exists a (total) role assign- 
ment pc : nodes(Hc) Ro such that ra C pc and for every 
object o £ Sc the predicate locallyConsistent(o, He, Pc, 5'c) is 
satisfied- 



5.4 Role Consistency 

We are now able to precisely state the role consistency re- 
quirements that must be satisfied for program execution. 
The role consistency requirements are in the fourth row of 
Figures El and We assume the operational semantics is 
extended with transitions leading to a program state with 
heap errorc whenever role consistency is violated. 

5.4.1 Offstage Consistency 

At every program point, we require con (He, offstage(Hc)) to 
be satisfied. This means that offstage objects have correct 
roles, but onstage objects may have their role temporarily 
violated. 

5.4.2 Reference Removal Consistency 

The Store statement x . f =y has the following safety precon- 
dition. When a reference (ox,/, o/) £ He for (procj, x, Oa,) £ 
He, and {ox,f,Of) £ He is removed from the heap, both o^ 
and Of must be referenced from the current procedure ac- 
tivation record. It is sufficient to verify this condition for 
Of , as Ox is already onstage by definition. The reference re- 
moval consistency condition enables the completion of the 
role change for Of after the reference {ox,f,Of) is removed 
and ensures that heap references are introduced and removed 
only between onstage objects. 

5.4.3 Procedure Call Consistency 

Our programming model ensures role consistency across pro- 
cedure calls using the following protocol. 

A procedure call proc'(a;i, --.,Xp) in Figure |7| requires the 
role consistency precondition conW(ra, He, 5c), where the 
partial role assignment ra requires objects Ok, corresponding 
to parameters Xk, to have roles preRj,(proc') expected by the 
callee, and Sc = offstage(Hc)U{ofe}fc for (proc^, a;fe, Ofc) £ He. 

To ensure that the callee proc^- never observes incorrect 
roles, we impose an accessibility condition for the callee's 
Load statements (see the fourth column of Figure |SJ. The 
accessibility condition prohibits access to any object o ref- 
erenced by some local variable of a stack frame other than 
procj, unless o is referenced by some parameter of proc^-. 
Provided that this condition is not violated, the callee proc^ 
only accesses objects with correct roles, even though objects 
that it does not access may have incorrect roles. In Section|7| 
we show how the role analysis ensures that the accessibility 
condition is never violated. 

At the procedure exit point (Figure [7|, we require cor- 
rect roles for all objects referenced by the current activation 
frame proCj . This implies that heap operations performed 
by proCj' preserve heap consistency for all objects accessed 
by proc^-. 

5.4.4 Explicit Role Check 

The programmer can specify a stronger invariant at any pro- 
gram point using statement roleCheck(xi, . . . , a;^, ra). As 
Figure IHl indicates, roleCheck requires the con\N{ra, He, Sc) 
predicate to be satisfied for the supplied partial role assign- 
ment ra where Sc ~ offstage(Hc) U {ok}k for objects Ok ref- 
erenced by given local variables Xk- 



9 



Statement 


Transition 


Constraints 


Role Consistency 


p : roleCheck(xi, . . . , Xn, ra) 


{p@proc-;s,Hc) — > 


{p,p') e EcK 


conW(ra,//„S'), 
S = offstage(i/c) U 

{o 1 (proq,a:fc,o> G He} 



Figure 8: Operational Semantics of Explicit Role Check 



5.5 Instrumented Semantics 

We expect the programmer to have a specific role assignment 
in mind when writing the program, with this role assignment 
changing as the statements of the program change the ref- 
erencing relationships. So when the programmer wishes to 
change the role of an object, he or she writes a program that 
brings the object onstage, changes its referencing relation- 
ships so that it plays a new role, then puts it offstage in its 
new role. The roles of other objects do not change.'^ 

To support these programmer expectations, we introduce 
an augmented programming model in which the role assign- 
ment pc is conceptually part of the program's state. The 
role assignment changes only if the programmer changes it 
explicitly using the setRole statement. The augmented pro- 
gramming model has an underlying instrumented semantics 
as opposed to the original semantics. 

Example 21 The original semantics allows asserting differ- 
ent roles at different program points even if the structure of 
the heap was not changed, as in the following procedure f oo. 

role Al { fields f : Bl; } 
role Bl { slots Al.f; } 
role A2 { fields f : B2; } 
role B2 { slots A2.f; } 
procedure foo() 
var X, y; 
{ 

X = new; y = new; 
x.f = y; 

roleCheck(x,y , x:Al,y:Bl); 
roleCheck(x,y, x:A2,y:B2); 

} 

Both role checks would succeed since each of the spec- 
ified partial role assignments can be extended to a 
valid role assignment. On the other hand, the check 
roleCheck(x,y, x:Al,y:B2) would fail. 

The procedure f oo in the instrumented semantics can be 
written as folUows. 

procedure foo() 

var X, y; 

{ 

X = new; y = new; 
x.f = y; 

setRole (x : Al) ; setRole (y : Bl) ; 
roleCheck(x,y , x:Al,y:Bl); 

^An extension to the programming model supports cas- 
cading role changes in which a single role change propagates 
through the heap changing the roles of offstage objects, see 
Section lO 



setRole (x : A2) ; setRole (y : B2) ; 
roleCheck(x,y, x:A2,y:B2); 

} 

The setRole statement makes the role change of object ex- 
plicit. 

The instrumented semantics extends the concrete heap 
He with a role assignment pc- Figure |5] outlines the changes 
in instrumented semantics with respect to the original se- 
mantics. We introduce a new statement setRole (x :r) , 
which modifies a role assignment pc, giving pc[ox i— > r], 
where Ox is the object referenced by x. All statements 
other than setRole preserve the current role assignment. 
For every consistency condition conW(ra, He, Sc) in the orig- 
inal semantics, the instrumented semantics uses the cor- 
responding condition conW(pc U ra,Hc,Sc) and fails if pc 
is not an extension of ra. Here we consider cor\{Hc,S) 
to be a shorthand for conW((/), i^c, 5). For example, the 
new role consistency condition for the Copy statement 
x=y is conW(pc, -ffc, offstage(_f/c)). The New statement as- 
signs an identifier unknown to the newly created object o„. 
By definition, a node with unknown does not satisfy the 
locallyConsistent predicate. This means that setRole must 
be used to set a a valid role of On before On moves offstage. 

By introducing an instrumented semantics we are not sug- 
gesting an implementation that explicitly stores roles of ob- 
jects at run-time. We instead use the instrumented seman- 
tics as the basis of our role analysis and ensure that all role 
checks can be statically removed. Because the instrumented 
semantics is more restrictive than the original semantics, our 
role analysis is a conservative approximation of both the in- 
strumented semantics and the original semantics. 

6 Intraprocedural Role Analysis 

This section presents an intraprocedural role analysis algo- 
rithm. The goal of the role analysis is to statically verify 
the role consistency requirements described in the previous 
section. 

The key observation behind our analysis algorithm is that 
we can incrementally verify role consistency of the concrete 
heap He by ensuring role consistency for every node when it 
goes offstage. This allows us to represent the statically un- 
bounded offstage portion of the heap using summary nodes 
with "may" references. In contrast, we use a "must" in- 
terpretation for references from and to onstage nodes. The 
exact representation of onstage nodes allows the analysis to 
verify role consistency in the presence of temporary viola- 
tions of role constraints. 

Our analysis representation is a graph in which nodes rep- 
resent objects and edges represent references between ob- 
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Statement 
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Role Consistency 






X £ local(proc), 








On fresh, 




p : x=new 


{p@proc.;s,Hc l+l {(proc-, x, o^)}, p^) — > 

/Ti'lfllnrnr- • o ft' n' \ 

\P Mproc^, s, n^, p^j 


{p,p') e £cFG(proc), 
He = He 
l±l{(proq,x,o„)} 
W{o„} X F X {null}, 
p'c = Pc[on unknown] 


conW(p;,, H'e, ofFstage(i^^)) 






X G local (proCj), 




P ■■ 

setRole(x:r) 


{p@proCi;s,Hc,pc) — > 
{p'@proc^;s,Hc,p'c) 


(proCj,x,Oa,) e He, 

p'c = Pc[Oa: r], 

{p,p') e EcFc 


conW(p:„//„ofFstage(J/J) 


p : Stat 


{s,Hc,Pc) — > 
{s',m,p,.) 


{s,He)--*{s',H'e) 


P Acon\N{pcUra,H'J,S) 
for every original condition 
P AconW(ra,//",5') 



Figure 9: Instrumented Semantics 



jects. There are two kinds of nodes: onstage nodes repre- 
sent onstage objects, with each onstage node representing 
one onstage object; and offstage nodes, with each offstage 
node corresponding to a set of objects that play that role. 
To increase the precision of the analysis, the algorithm oc- 
casionally generates multiple offstage nodes that represent 
disjoint sets of objects playing the same role. Distinct off- 
stage objects with the same role r represent disjoint sets of 
objects of role r with different reachability properties from 
onstage nodes. 

We frame role analysis as a data-flow analysis operating 
on a distributive lattice 'P(RoleGraphs) of sets of role graphs 
with set union U as the join operator. In this section we 
present an algorithm for intraprocedural analysis. We use 
proCj, to denote the topmost activation record in a concrete 
heap He- In Section |7| we generalize the algorithm to the 
compositional interprocedural analysis. 

6.1 Abstraction Relation 

Every data-flow fact Q C RoleGraphs is a set of role graphs 
G £ Q. Every role graph G G RoleGraphs is either a bot- 
tom role graph _Lg representing the set of all concrete heaps 
(including errorc), or a tuple G — {H, p, K) representing non- 
error concrete heaps, where 

• H(ZNxFxN is the abstract heap with nodes A'^ 
representing objects and fields F. The abstract heap 
H represents heap references (ni, /, 712) and variables of 
the currently analyzed procedure (proc, x, n) where x £ 
local (proc). Null references are represented as references 
to abstract node null. We define abstract onstage nodes 
onstage(_ff) = {n | (proc, 2;, n) £ H,x £ local(proc) U 
param(proc)} and abstract offstage nodes offstage(_ff) = 
nodes(J/) \ onstage(//) \ {proc, null}. 

• p : nodes(_ff) Ro is an abstract role assignment, 
p(null) — nulli?; 

• K : nodes(_ff) {i, s] indicates the kind of each node; 
when K{n) = i, then n is an individual node repre- 
senting at most one object, and when K{n) = s, n is a 
summary node representing zero or more objects. We 



require /s'(proc) = A'(null) = i, and require all onstage 
nodes to be individual, A'[onstage(_ff)] C {z}. 

The abstraction relation a relates a pair {Hc,pc) of con- 
crete heap and concrete role assignment with an abstract 
role graph G. 

Definition 22 We say that an abstract role graph G rep- 
resents concrete heap He with role assignment pe and write 
{He,Pe)aG, iffG^La or: He + error,, G = {H,p,K), 
and there exists a function h : nodes(_ffc) nodes(_H') such 
that 

1) He is role consistent: con\N{pe, He, offstage{He)) , 

2) identity constraints of onstage nodes with offstage nodes 
hold: if {oi,f,02) G He and (02,3,03) £ He for oi £ 
onstage(_ffc), 02 £ offstage(_ffc), and 

{f,g) £ identities(pc(oi)), then 03 — o\; 

3) h is a graph homomorphism: if {oi, /, 02) £ He then 
{hior),f,h{o2)) GH; 

4) an individual node represents at most one concrete ob- 
ject: K{n) = i implies \h~^{n)\ < I; 

5) h is bijection on edges which originate or terminate at 
onstage nodes: if (n-i, f ,n2) £ H and n\ £ onstage(_ff) 
or n2 £ onstage(_ff), then there exists exactly one 

{01, /, 02) £ He such that h{oi) = ni and h{o2) = n2; 

6) h{nu\\e) — null and /i(proc^) — proc; 

7) the abstract role assignment p corresponds to the con- 
crete role assignment: Pe{o) — p{h{o)) for every object 
o £ nodes(-ffc). 

Note that the error heap errorc can be represented only by 
the bottom role graph ±g- The analysis uses ±g to indicate 
a potential role error. 

Condition 3) implies that role graph edges are a conserva- 
tive approximation of concrete heap references. These edges 
are in general "may" edges. Hence it is possible for an off- 
stage node n that (n, /, ni), (n, /, 712) £ H for ni 7^ 712. This 
cannot happen when n £ onstage(_ff) because of 5). Another 
consequence of 5) is that an edge in H from an onstage node 
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Figure 10: Abstraction Relation 



no to a summary node Us implies that Us represents at least 
one object. Condition 2) strengthens 1) by requiring certain 
identity constraints for onstage nodes to hold, as explained 
in Section ETl 

Example 23 Consider the following role declaration for an 
acyclic list. 

role L { // List header 
fields first : LN I null; 

} 

role LN { // List node 
fields next : LN I null; 
slots LN.next I L. first; 
acyclic next ; 

} 

Figure ITHl shows a role graph and one of the concrete heaps 
represented by the role graph via homomorphism h. There 
are two local variables, prev and current, referencing dis- 
tinct onstage objects. Onstage objects are isomorphic to 
onstage nodes in the role graph. In contrast, there are two 
objects mapped to each of the summary nodes with role LN 
(shown as LN-labelled rectangles in Figure HIH . Note that the 
sets of objects mapped to these two summary nodes are dis- 
joint. The first summary LN-node represents objects stored 
in the list before the object referenced by prev. The second 
summary LN-node represents objects stored in the list after 
the object referenced by current. 



Figure 11: Simulation Relation Between Abstract and Con- 
crete Execution 

6.2 Transfer Functions 

The key complication in developing the transfer functions 
for the role analysis is to accurately model the movement 
of objects onstage and offstage. For example, a load state- 
ment x=y . f may cause the object referred to by y . f to move 
onstage. In addition, if x was the only reference to an on- 
stage object o before the statement executed, object o moves 
offstage after the execution of the load statement, and thus 
must satisfy the locallyConsistent predicate. 

The analysis uses an expansion relation < to model the 
movement of objects onstage and a contraction relation ^ 
to model the movement of objects offstage. The expansion 
relation uses the invariant that ofi'stage nodes have correct 
roles to generate possible aliasing relationships for the node 
being pulled onstage. The contraction relation establishes 
the role invariants for the node going offstage, allowing the 
node to be merged into the other offstage nodes and repre- 
sented more compactly. 

We present our role analysis as an abstract execution re- 
st 

lation The abstract execution ensures that the abstrac- 
tion relation a is a forward simulation relation ^33. from 
the space of concrete heaps with role assignments to the set 
RoleGraphs. The simulation relation implies that the traces 
of include the traces of the instrumented semantics — >. 
To ensure that the program does not violate constraints as- 
sociated with roles, it is thus sufficient to guarantee that _Lg 
is not reachable via 

To prove that _Lg is not reachable in the abstract execu- 
tion, the analysis computes for every program point p a set 
of role graphs Q that conservatively approximates the pos- 
sible program states at point p. The transfer function for a 

St 

statement st is an image [st](C/) = {G' | G G 0,0-^0'}. 

St 

The analysis computes the relation in three steps: 

1. ensure that the relevant nodes are instantiated using 
expansion relation < (Section 16.2. Ill : 

st 

2. perform symbolic execution => of the statement st 
(Section l^2.31 : 

3. merge nodes if needed using contraction relation to 
keep the role graph bounded (Section 16.2.21 . 

Figure [TTI shows how the abstraction relation a relates -<, 
st 

=>, and >^ with the concrete execution — > in instrumented 
semantics. Assume that a concrete heap {Hc,pc) is repre- 
sented by the role graph Gi . Then one of the role graphs G2 
obtained after expansion remains an abstraction of {He, pc)- 
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Transition 


Definition 


Conditions 




^yj x=V.f "x 

{H,p,K} :< Gi ^ G2tG' 


(proc, X, no;}, (proc,y, riy) G H 


{H,p,K)'''J G' 


{H,p,K}^GiyG' 


(proc, X, ni) £ H 




{H, p, K) ™" Gi h G' 


(proc, x,ni) £ H 


{H,p,K)'^G' 


{H,p,K)^G' 


s e {x.f=y, 
test (c) , 
setRole(x:r) , 
roleCheck(xi..p, ra)} 



Figure 12; Abstract Execution 



The symbolic execution followed by the contraction rela- 
tion y corresponds to the instrumented operational seman- 
tics —>. 

Figure fT^ shows rules for the abstract execution relation 

St 

Only Load statement uses the expansion relation, be- 
cause the other statements operate on objects that are al- 
ready onstage. Load, Copy, and New statements may re- 
move a local variable reference from an object, so they use 
contraction relation to move the object offstage if needed. 
For the rest of the statements, the abstract execution re- 
duces to symbolic execution =^ described in Section [6.2.81 

St 

Nondeterminism and Failure The relation is not a 
function because the expansion relation ^ can generate a set 
of role graphs from a single role graph. Also, there might be 

St 

no transitions originating from a given state G if the sym- 
bolic execution => produces no results. This corresponds 
to a trace which cannot be extended further due to a test 
statement which fails in state G. This is in contrast to a 
transition from G to _Lg which indicates a potential role con- 
sistency violation or a null pointer dereference. We assume 
that and >: relations contain the transition (±g,-Lg) 
to propagate the error role graph. In most cases we do not 
write the explicit transitions to error states. 

6.2.1 Expansion 

Figure 1131 shows the expansion relation ^ . Given a role 
graph {H, p, K) expansion attempts to produce a set of role 
graphs {H' , p' , K') in each of which {n,f,no) £ H' and 
K{n(i) — i. Expansion is used in abstract execution of the 
Load statement. It first checks for null pointer dereference 
and reports an error if the check fails. If (n, /, n') £ H and 
K{n') = i already hold, the expansion returns the original 
state. Otherwise, {n,f,n') £ H with K{n') = s. In that 
case, the summary node n' is first instantiated using instan- 

tiation relation ff-. Next, the split relation || is applied. Let 

p(no) = r. The split relation ensures that no is not a member 
of any cycle of offstage nodes which contains only edges in 
acyclic(r). We explain instantiation and split in more detail 
below. 

Instantiation Figure 1141 presents the instantiation rela- 
tion. Given a role graph G — {H, p, K) , instantiation fl- 

n' 



generates the set of role graphs {H' , p' , K') such that each 
concrete heap represented by {H, p, K) is represented by one 
of the graphs {H' , p' ,K'). Each of the new role graphs con- 
tains a fresh individual node no that satisfies localCheck. 
The edges of no are a subset of edges from and to n'. 

Let Hq be a subset of the references between n' and on- 
stage nodes, and let H\ be a subset of the references between 
n' and offstage nodes. References in Ho are moved from n' 
to the new node no, because they represent at most one ref- 
erence, while references in H\ are copied to no because they 
may represent multiple concrete heap references. Moving a 
reference is formalized via the swing operation in Figure ITU 

The instantiation of a single graph can generate multiple 
role graphs depending on the choice of Hq and H[ . The num- 
ber of graphs generated is limited by the existing references 
of node n' and by the localCheck requirement for no. This is 
where our role analysis takes advantage of constraints asso- 
ciated with role definitions to reduce the number of aliasing 
possibilities that need to be considered. 

Split The split relation is important for verifying opera- 
tions on data structures such as skip lists and sparse matri- 
ces. It is also useful for improving the precision of the initial 
set of role graphs on procedure entry ('Section l7.2.1|l . 

The goal of the split relation is to exploit the acyclicity 
constraints associated with role definitions. After a node no 
is brought onstage, split represents the acyclicity condition 
of p(no) explicitly by eliminating impossible paths in the 
role graph. It uses additional offstage nodes to encode the 
reachability information implied by the acyclicity conditions. 
This information can then be used even after the role of node 
no changes. In particular, it allows the acyclicity condition 
of no to be verified when no moves offstage. 

Example 24 Consider a role graph for an acyclic list with 
nodes LN and a header node L. The instantiated node no is 
in the middle of the list. Figure 1161 a) shows a role graph 
with a single summary node representing all offstage LN- 
nodes. Figure 1161 b) shows the role graph after applying 
the split relation. The resulting role graph contains two 
LN summary nodes. The first LN summary node represents 
objects definitely reachable from no along next edges; the 
second summary NL node represents objects definitely not 
reachable from no. 

Figure 1151 shows the definition of the split operation on 

no 

node no, denoted by |[ . Let G — {H,p,K) be the initial 
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Transition 


Definition 


Condition 


{H,p,K)''<{H,p,K) 




{n, f, n') £ H,n' £ onstage(i?) 


{H,p,K)"<G' 


no no 

(H,p,K) i((Hi,pi,K,) II G' 

n' 


(n, /, n'> €H,n' € offstage(i?) 
{n, /, no) € -ffi 



Figure 13: Expansion Relation 





H' ^ H\HoUHl,U H[ 




n' ^ nodes{H'), ii K{n') =i 




p' = p[no ^ p(n )\ 




K' = K\nK, f-> i\ 


no 

{H,p,K) it(H',p',K') 

rt' 


localCheck(no, (i?', p', Jf')) 


HoCHn (onstage(J?) x F x {n'} U {n'} x F x onstage(J/)) 
HiCHn (ofFstage(7?) x F x {n'} U {n'} x F x ofFstage(i/)) 
i^o = swing(n',no,ii'o) 
H[ C swing(n',no,-ffi) 



swing(nQ|j , rinew, H) = {(nnew, /, n) | {n^ij ,f,n) eH}U {(n, /, nnew) j {n, /, n^ij} € ii"} U 

{(nnew,/,nnew> | ("old' /' "old) ^ •'^i 



Figure 14: Instantiation Relation ff- 



no 

{H,p,K) \\{H,p,K}, acycCheck{no, {H,p,K), of{stage{H)) 

no 

{ii-, p, K) II (ii-', p', K'), -.acycCheck(no, (i?, p, if), ofFstage(ii-)) 

where 

H' = {H\ iicyc) U i/off U SfNR U Sffl U BtNR U StR UNiUNt 

Hcyc = {(ni,f,n2) I ni or 'fi2 G Scyc} 

Hod = {{n'i,f,n'2) I ni = c{n'i),n2 = c{n'2), 

ni,n2 G ofFstage^(/i), 77-1 or Ti2 ^ Scyci 
{niJ,n2)£H} 
\{Sn X acyclic(r) x Snr) 
H n (onstage(ii-) x i^ U {no} x acyclic(r)) x Scyc = ^fNR W ^fR, 
ii" n S'cyc X (acyclic(r) X {no} U F x onstage(ii-)) = Anr ttl Ar 
i3fNR = {(ni, /, ftNR(n2)) I (ni,/, ri2) G /IfNR} 
-BfR = {{ni, f,hn{n2)) \ (ni,/,n2> G AfR} 
-BtNR = {(^iNR(ni),/,n2) I (ni,/,n2> G Anr} 
-BtR = {(/!.R(ni),/,n2> I (ni,/, 7-12) G Air} 
ATf = {{no, f, n') I n' G Sr, (no, /, c(n')) G i/, / G acyclic(r)} 
iVt = {(n', /, no) I n' € 5nr, (c(n'), /, no) eH,f€ acyclic(r)} 
Scyc = {n I 3ni, . . . , Up-i G offstage(ii-) : 

(no,/o,ni),. . . , {nk,fk,n), (n, /fe+i, nfc+2), (np-i, /p-i, no) G i?, 

/o, . . . , /p-i G acyclic(r)} 
offstagei(i^) = offstage(iy) \ {no} 
r = p(no) 

p'(c(n)) = p(n) 
i<:'(c(n)) = is:(n) 

Figure 15: Split Relation 
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b) After Split 



Figure 16: A Role Graph for an Acyclic List 



role graph and p(no) = r. If acyclic(r-) = 0, then the split 
operation returns the original graph G; otherwise it proceeds 
as follows. Call a path in graph H cycle-inducmg if all of its 
nodes are offstage and all of its edges are in acyclic(r). Let 
Scyc be the set of nodes n such that there is a cycle- inducing 
path from no to n and a cycle- inducing path from n to no. 

The goal of the split operation is to split the set ^cyc into 
a fresh set of nodes Snr representing objects definitely not 
reachable from no along edges in acyclic(r) and a fresh set of 
nodes Sn representing objects definitely reachable from no. 
Each of the newly generated graphs H' has the following 
properties: 

1) merging the corresponding nodes from Snr and Sn in 
H' yields the original graph H; 

2) no is not a member of any cycle in H' consisting of 
offstage nodes and edges in acyclic(r); 

3) onstage nodes in H' have the same number of fields and 
aliases as in _ff. 

Let So = nodes(_H') \Scyc and let Hnr : Scyc Snr and /ir : 
Scyc — > Sr be bijections. Define a function c : nodes(i/') —> 
nodes(//) as follows: 



c(n) = 




Then H' C {(n'i,/,n^) | (cK), /, c^)} G H}. 

Because there are two copies of So in H' , there might be 
multiple edges (n'l,/, n2) in H' corresponding to an edge 
(c(ni),/,c(n2)> G H. 



If both n'l and n'2 are offstage nodes other than no, we 
always include {n'l, f,n'2) in H' unless (n'l, /, n2) G Sr x 
acyclic(r) x Snr. The last restriction prevents cycles in H' . 

For an edge (m,/, n2) G H where ni G onstage(_H') and 
n2 G Scyc we include in H' either the edge (ni, /, /iNR(n2)) 
or (m, /, /iR(n2)) but not both. Split generates multiple 
graphs H' to cover both cases. We proceed analogously if 
n2 G onstage(//) and ni G Scyc. The node no itself is treated 
in the same way as onstage nodes for / ^ acyclic(r). If 
/ G acyclic(r) then we choose references to no to have a 
source in Snr, whereas the reference from no have the target 
in Sr. 

Details of the split construction are given in Figure 1151 
The intuitive meaning of the sets of edges is the following: 
Hofi : edges between offstage nodes 
-BfNR : edges from onstage nodes to Snr 
BfR : edges from onstage nodes to Sr 
BtNR : edges from Snr to onstage nodes 
-BtR : edges from Sr to onstage nodes 
Nf : acyclic(r)-edges from no to Sr 
A^t : acyclic(r)-edges from Snr to no 
The sets BfNR and BfR are created as images of the sets 
j4fNR and A{b. which partition edges from onstage nodes to 
nodes in Scyc. Similarly, the sets BtNR and BtR are created 
as images of the sets AtNR and AtK which partition edges 
from nodes in Scyc to onstage nodes. 



We note that if in the split operation Sc- 



then the 



operation has no effect and need not be performed. In Fig- 
ure^! after performing a single split, there is no need to 
split for subsequent elements of the list. Examples like this 
indicate that split will not be invoked frequently during the 
analysis. 

6.2.2 Contraction 

Figure IT7I shows the non-error transitions of the contraction 

n 

relation The analysis uses contraction when a local vari- 
able reference to node n is removed. If there are other local 
references to n, the result is the original graph. Otherwise 
n has just gone offstage, so analysis invokes nodeCheck. If 
the check fails, the result is _Lg. If the role check succeeds, 
the contraction invokes normalization operation to ensure 
that the role graph remains bounded. For simplicity, we use 
normalization whenever nodeCheck succeeds, although it is 
sufficient to perform normalization only at program points 
adjacent to back edges of the control-flow graph. 

Normalization Figure 1181 shows the normalization rela- 
tion. Normalization accepts a role graph {H, p, K) and pro- 
duces a normalized role graph {H' , p' , K') which is a factor 
graph of {H,p,K) under the equivalence relation ~. Two 
offstage nodes are equivalent under ~ if they have the same 
role and the same reachability from onstage nodes. Here we 
consider node n to be reachable from an onstage node no 
iff there is some path from no to n whose edges belong to 
acyclic(p(no)) and whose nodes are all in offstage(//). Note 
that, by construction, normalization avoids merging nodes 
which were previously generated in the split operation ||, 
while still ensuring a bound on the size of the role graph. 
For a procedure with / local variables, / fields and r roles the 
number of nodes in a role graph is on the order of r2' so the 
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n 

{H,p,K)>{H,p,K) 


3x G var(proc) : 
(proc, x,n) G H 


{H,p,K) ^ noma\\ze{(H,p,K)) 


nodeCheck(n, {H, p, A'), ofFstage(J/)) 



Figure 17; Contraction Relation 



normalize({ff,/9,Js:>) = {H',p',K') 

where H' = {(ni/^,/, 712/^) | (ni,/, n2> G //} 
p'("-/~) = 

^, f i, n/^ = {n},A:(n) = j 

^ I s, otherwise 

ni ^ n2 iff ni = 712 or 

(ni,n2 e offstage(_H"),p(ni) = /5(n2), 

Vno G onstage(i/) : (reach(no, ni) iff reach(no, 712)) 

reach{no, n) iff Btii, . . . , np_i G offstage(n), 3/i, . ■ ■ , fp G acyclic(p(no)) 
{no,fi,ni), . . ., {np^i,fp,n) G -ff 



Figure 18: Normalization 



maximum size of a chain in the lattice is of the order of 2^^ . 
To ensure termination we consider role graphs equal up to 
isomorphism. Isomorphism checking can be done efficiently 
if normalization assigns canonical names to the equivalence 
classes it creates. 

6.2.3 Symbolic Execution 

Figure 1191 shows the symbolic execution relation =5^. In 
most cases, the symbolic execution of a statement acts on 
the abstract heap in the same way that the statement would 
act on the concrete heap. In particular, the Store statement 
always performs strong updates. The simplicity of symbolic 
execution is due to conditions 3) and 5) in the abstraction 
relation a. These conditions are ensured by the ^ relation 
which instantiates nodes, allowing strong updates. The sym- 
boHc execution also verifies the consistency conditions that 
are not verified by ^ or 

Verifying Reference Removal Consistency The ab- 

st 

stract execution ~> for the Store statement can easily verify 
the Store safety condition from section 15.4.21 because the 
set of onstage and offstage nodes is known precisely for ev- 
ery role graph. It returns _Lg if the safety condition fails. 

Symbolic Execution of setRole The setRole(x:r) 

statement sets the role of node referenced by variable 
X to r. Let G = {H, p, K) be the current role graph and 
let (proc, X, n^:) G H. If has no adjacent offstage nodes, 
the role change always succeeds. In general, there are re- 
strictions on when the change can be done. Let {He, pc) 
be a concrete heap with role assignment represented by G 
and h he & homomorphism from He to H. Let h{ox) = rix- 
Let ro = Pc{ox)- The symbolic execution must make sure 
that the condition conW(/9c, -ffc, offstage(i?c)) continues to 
hold after the role change. Because the set of onstage nodes 
does not change, it suffices to ensure that the original roles 



for offstage nodes are consistent with the new role r. The 
acyclicity constraint involves only offstage nodes, so it re- 
mains satisfied. The other role constraints are local, so they 
can only be violated for offstage neighbors of rix- To make 
sure that no violations occur, we require: 

1. r G field/ (p(n)) for all {n,f,nx) G H, and 

2. (r, /) G sloti(p(n)) for all {nx,f,n) G H and every slot 
i such that {ro,f) G s\oti{p{n)) 

This is sufficient to guarantee conW(/9c, -ffc, offstage(i/c)). 
To ensure condition 2) in Definition 1221 of the abstraction 
relation, we require that for every {f,g} G identities(r), 

1- if 19} G identities(r-o) or 

2. for all {nx,f,n) G H: K{n) = i and {{n,g,n') G H 
implies n' = rix). 

We use ro\eChOk{nx,r, {H, p, K)) to denote the check just 
described. 

Symbolic Execution of roleCheck To symbolically ex- 
ecute roleCheck(a;i, . . . , a;p, ra), we ensure that the conW 
predicate of the concrete semantics is satisfied for the con- 
crete heaps which correspond to the current abstract role 
graph. The symbolic execution for roleCheck returns the 
error graph _Lg if p is inconsistent with ra or if any of the 
nodes m referenced by Xi fail to satisfy nodeCheck. 

6.2.4 Node Check 

The analysis uses the localCheck, acycCheck, acycCheckAII, 
and nodeCheck predicates to incrementally maintain the ab- 
straction relation. 

We first define the predicate localCheck, which roughly 
corresponds to the predicate locallyConsistent (Definition|5J, 
but ignores the nonlocal acyclicity condition and addition- 
ally ensures condition 2) from Definition 1221 
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Statement s 


Transition 


Conditions 


X = y.f 


{H l±l {proc, X, n^}, p, K) =^ {H W {proc, x, n/}, p, K) 


(proc,y,ny), {ny,f,nf) e H 


x.f = y 


{H W {n, ,f,nf},p,K)^{H^ {n, , /, n J, p, K) 


(proc,x, 71a:), (proc, y,ny) 
Uf £ onstage(_ff) 


X = y 


{H 1+] {proc, X, Ux}, p, K) (// tbi {proc, x, Uy}, p, if) 


(proc,y,na) G H 


X = new 


(ii td {proc, X, ria:}, p, K) =l» W {proc, x, n„}, p', if) 


n„ fresh 
p' = p[nn 1— > unknown] 


test (c) 


{H,p,7f>^(//,p,7f) 


satisfied(c, H) 


setRole(x:r) 


{H,p,K)^{H,p[nx^r],K) 


(proc,x, nj,) G H 
roleChOk(n^,r, {H,p,K)) 


roleCheck(a::i..p, ra) 


{H,p,K)^{H,p,K) 


Vi (proc, Xi,ni) e H 
nodeCheck(ni , {H, p, K) , S) 
S = offstage(iy) U {n^}i 
p{n,) = ra(ni) 



satisfied (x==y, iic) iff {o | (proc,x, o) G _ffc} = {o | (proc,y,o) G -ffc} 
satisfied (! (x==y),//c) iff not satisfied (x==y, //c) 



Figure 19: Symbolic Execution of Basic Statements 



Definition 25 For a role graph G = {H,p,K), an individ- 
ual node n and a set S, the predicate localCheck(n, G) holds 
iff the following conditions are met. Let r = p{n). 

lA. (Outgoing fields check) For fields f G F, if {n, f,n') G 
H then p{n') G field/(r-). 

2A. (Incoming slots check) Let {{ui, fi) , . . . , {rik, fk)} ~ 
{('^'i/) I ('^'i/i"') £ H} be the set of all aliases of 
node n in abstract heap H . Then k = slotno(r-) and 
there exists a permutation p of the set {1, . . . , fc} such 
that {p{ni),fi) G slotp;(r) for alii. 

3A. (Identity Check) If{n,f,n') G H, {n',g,n") G H, 
{f,g) G identities(r), and K{n') = i, then n = n" . 

4A. (Neighbor Identity Check) For every edge (n', /, n) G H , 
if K{n') = i, p{n') = r' and {f,g) G identities(r') then 
(n,g,n') G H. 

5A. (Field Sanity Check) For every f £ F there is exactly 
one edge (n, /, n') G H . 

Conditions lA and 2A correspond to conditions 1) and 2) 
in Definition 121 Condition 3) in Definition 1191 is not neces- 
sarily implied by condition 3A) if some of the neighbors of 
n are summary nodes. Condition 3) cannot be established 
based only on summary nodes, because verifying an identity 
constraint for field / of node n where (n, /, n') £ H requires 
knowing the identity of n', not only its existence and role. 
We therefore rely on Condition 2) of the Definition 1221 to 
ensure that identity relations of neighbors of node n are sat- 
isfied before n moves offstage. 

The predicate acycCheck(n, G, S) verifies the acyclicity 
condition from Definition 1191 for the node that has just been 
brought onstage. The split operation uses acycCheck to 
check whether it should split any nodes (Section 16.2.1^ . The 
set S represents offstage nodes. 

Definition 26 We say that a node n satisfies an acyclic- 
ity check m graph G = {H, p, K) with respect to the set 



S, and we write acycCheck(n, G, S) , iff it is not the case 
that H contains a cycle ni , /i , . . . , , /s , Jti where ni — n, 
fi,. . . , fs G acyclic(p(n)) and ni, . . . , G S. 

The analysis uses the predicate acycCheckAII(n, G, S) in 
the contraction relation ('Section lb.2.21 1 to make sure that n 
is not a member of any cycle that would violate the acyclicity 
condition of any of the nodes in S, including n. 

Definition 27 We say that a node n satisfies a strong 
acyclicity check tn graph G — {H, p, K) with respect to a 
set S, and we write acycCheckAII(n, G, 5"), iff it is not the 
case that: H contains a cycle ni , /i , . . . , , /s , ni where 
ni = n, fi,...,fs G acyclic(p(ni)), for some 1 < i < s, 
and ni, . . . jUa £ S. 

acycCheckAII is a stronger condition than acycCheck because 
it ensures the absence of cycles containing n for acyclic(p(ni)) 
fields of all offstage nodes Ui, and not only for the fields 
acyclic(p(n)). 

The analysis uses the predicate nodeCheck to verify that 
bringing a node n offstage does not violate role consistency 
for offstage nodes. 

Definition 28 nodeCheck(n, G, S) holds iff both predicates 
localCheck(n, G) and acycCheckAII (n, G, 5") hold. 

7 Interprocedural Role Analysis 

This section describes the interprocedural aspects of our role 
analysis. Interprocedural role analysis can be viewed as an 
instance of the functional approach to interprocedural data- 
flow analysis |41) . For each program point p, role analysis 
approximates program traces from procedure entry to point 
p. The solution in 1411 proposes tagging the entire data-flow 
fact G at point p with the data flow fact Go at procedure en- 
try. In contrast, our analysis computes the correspondence 
between heaps at procedure entry and heaps at point p at 
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the granularity of sets of objects that constitute role graphs. 
This allows our analysis to detect which regions of the heap 
have been modified. We approximate the concrete execu- 
tions of a procedure with procedure transfer relations con- 
sisting of 1) an initial context and 2) a set of effects. Efi'ects 
are fine-grained transfer relations which summarize load and 
store statements and can naturally describe local heap mod- 
ifications. In this paper we assume that procedure transfer 
relations are supplied and we are concerned with a) verifying 
that transfer relations are a conservative approximation of 
procedure implementation b) instantiating transfer relations 
at call sites. 



imply that parameters pi and p2 must be aliased, 

pi -> nl 
p2 -> n2 

force pi and p2 to be unaliased, whereas 

pi -> nl|n2 
p2 -> nl|n2 

allow for both possibilities. A heap edge n -f-> m denotes 
(n, f,m) G Hic- The shorthand notation 



7.1 Procedure Transfer Relations 

A transfer relation for a procedure proc extends the pro- 
cedure signature with an initial context context(proc), and 
procedure effects effect(proc). 

7.1.1 Initial Context 

Figures 1^ and ITTI contain examples of initial context speci- 
fication. An initial context is a description of the initial role 
graph {H\c, p\c, Kic) where pic and K\c are determined by a 
nodes declaration and H,c is determined by a edges declara- 
tion. The initial role graph specifies a set of concrete heaps 
at procedure entry and assigns names for sets of nodes in 
these heaps. The next definition is similar to Definition l22l 

Definition 29 We say that a concrete heap {Hc,pc) is rep- 
resented by the initial role graph {H\c, p\c, Kic) and write 
{He, Pc) cto{H\c, Pic, K,c) , iff there exists a function ho : 
nodes{Hc) nodes(i7ic) such that 

1. con\N {pc. He, hg^ {read{proc)); 

2. ho is a graph homomorphism; 

3. K\c{n) — i implies \hQ^{n)\ < 1; 

4- /lo(nullc) = null and /lo(proc^) = proc; 

5. pc(o) = pic{ho{o)) for every object o G nodes(_ffc). 

Here read(proc) is the set of initial-context nodes read by 
the procedure (see below). For simplicity, we assume one 
context per procedure; it is straightforward to generalize the 
treatment to multiple contexts. 

A context is specified by declaring a list of nodes and a 
list of edges. 

A list of nodes is given with nodes declaration. It specifies 
a role for every node at procedure entry. Individual nodes 
are denoted with lowercase identifiers, summary nodes with 
uppercase identifiers. By using summary nodes it is possible 
to indicate disjointness of entire heap regions and reachabil- 
ity between nodes in the heap. 

There are two kinds of edges in the initial role graph: pa- 
rameter edges and heap edges. A parameter edge p->pn is 
interpreted as (proc, p,pn) G Hic. We require every parame- 
ter edge to have an individual node as a target, we call such 
node a parameter node. The role of a parameter node refer- 
enced by paramj(proc) is always preRj(proc). Since difi'erent 
nodes in the initial role graph denote disjoint sets of concrete 
objects, parameter edges 

pi -> nl 
p2 -> nl 



nl -f-> n2 
-g-> n3 

denotes two heap edges (nl,f,n2), (nl,g,n3) G J^ic. An ex- 
pression nl -f-> n2|n3 denotes two edges nl -f-> n2 and 
nl -f-> n3. We use similar shorthands for parameter edges. 




sleeping? roc 



nodes ph : RunningHeader , 

PI, px, P2 : RurmingProc , 
Ix : LiveHeader, 
LLl, 12, LL2 : LiveList; 
edges p-> px, l-> px, 
ph -next-> Pllpx 

-prev-> px I P2 , 
PI -next-> Pllpx 

-prev-> phi PI, 
px -next-> P2|ph 

-prev-> PI jph, 
P2 -next-> P2|ph 

-prev-> P2|px, 
Ix -next-> LLl 112, 
LLl -next-> LLl |12 

-proc-> PI I P2 I SleepingProc 
12 -next-> LL2|null 

-proc-> px, 
LL2 -next-> LL2|null 

-proc-> PI I P2 I SleepingProc 



Figure 20: Initial Context for kill Procedure 
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Example 30 Figure 1201 shows an initial context graph for 
the kill procedure from Example 1171 It is a refinement of 
the role reference diagram of Figure as it gives description 
of the heap specific to the entry of kill procedure. The 
initial context makes explicit the fact that there is only one 
header node for the list of running processes (ph) and one 
header node for the list of all active processes (ix). More im- 
portantly, it shows that traversing the list of active processes 
reaches a node 12 whose proc field references the parameter 
node px. This is sufficient for the analysis to conclude that 
there will be no null pointer dereferences in the while loop 
of kill procedure since 12 is reached before null. 

We assume that the initial context always contains the role 
reference diagram RRD (Definition (HJ. Nodes from RRD are 
called anonymous nodes and are referred to via role name. 
This further reduces the size of initial context specifications 
by leveraging global role definitions. In Figure 1201 there IS 
no need to specify edges originating from SleepingProc or 
even mention the node SleepingTree, since role definitions 
alone contain enough information on this part of the heap 
to enable the analysis of the procedure. Note, however, that 
all edges between anonymous nodes and named nodes must 
be explicitly specified. 

7.1.2 Procedure Effects 

Procedure effects conservatively approximate the region of 
the heap that the procedure accesses and indicate changes 
to the referencing relationships in that region. There are two 
kinds of effects: read effects and write effects. 

A read effect specifies a set read (proc) of initial graph 
nodes accessed by the procedure. It is used to ensure that 
the accessibility condition in Section f5.4..'-{l is satisfied. If the 
set of nodes denoted by read (proc) is mapped to a node n 
which is onstage in the caller but is not an argument of the 
procedure call, a role check error is reported at the call site. 

Write effects are used to modify caller's role graph to con- 
servatively model the procedure call. A write effect ei./ = 62 
approximates Store operations within a procedure. The ex- 
pression ei denotes objects being written to, / denotes the 
field written, and 62 denotes the set of objects which could 
be assigned to the field. Write effects are may effects by de- 
fault, which means that the procedure is free not to perform 
them. It is possible to specify that a write effect must be 
performed by prefixing it with a "!" sign. 

Example 31 In Figure 1211 the insert procedure inserts 
an isolated cell into the end of an acyclic singly linked list. 
As a result, the role of the cell changes to LN. The initial 
context declares parameter nodes In and xn (whose initial 
roles are deduced from roles of parameters), and mentions 
anonymous LN node from a default copy of the role reference 
diagram RRD. The code of the procedure is summarized 
with two write effects. The first write effect indicates that 
the procedure may perform zero or more Store operations 
to field next of nodes mapped to In or LN in context(proc). 
The second write effect indicates that the execution of the 
procedure must perform a Store to the field next of xn node 
where the reference stored is either a node mapped onto 
anonymous LN node or null. 



procedure insert (1 : L, 

x : IsolatedN -» LN) 

nodes In, xn; 

edges l-> In, x-> xn. 

In -next-> LNlnull; 
effects InlLN . next = xn, 

! xn . next = LN I null ; 
local c, p; 
{ 

p = 1; 

0=1. next ; 
while (c!=null) { 

p = c; 

c = p . next ; 

} 

p . next = x ; 
X . next = c ; 
setRole(x:LN) ; 

} 

Figure 21: Insert Procedure for Acyclic List 

procedure insertSome(l : L) 
nodes In; 
edges l-> In, 

In -next-> LNlnull; 
effects InlLN . next = NEW, 
NEW. next = LNlnull; 
aux c , p , x ; 
{ 

p = 1; 

c = 1 . next ; 
while (c!=null) { 

p = c; 

c = p . next ; 

} 

X = new; 
p . next = X ; 
X . next = c ; 
setRole(x:LN) ; 



Figure 22: Insert Procedure with Object Allocation 

Effects also describe assignments that procedures perform 
on the newly created nodes. Here we adopt a simple solution 
of using a single summary node denoted NEW to represent 
all nodes created inside the procedure. We write nodeso(-ffic) 
for the set nodes(//,c) U {NEW}. 

Example 32 Procedure insertSome in Figure 1^ is similar 
to procedure insert in Figure VT\\ except that the node in- 
serted is created inside the procedure. It is therefore referred 
to in effects via generic summary node NEW. 

We represent all may write effects as a set mayWr(proc) of 
triples {nj,f,n'j) where n,n'j £ nodeso(-ffic) and f £ F. We 
represent must write effects as a sequence mustWr^ (proc) of 
subsets of the set K^^{i) x F x nodeso(//ic). Here 1 < j < 
mustWrNo(proc). 

To simplify the interpretation of the declared proce- 
dure effects in terms of concrete reads and writes, we re- 
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quire the union UimustWri(proc) to be disjoint from the 
set mayWr(proc). We also require the nodes ni,...,nk in 
a must write effect ni| ■ ■ • \nk.f = 62 to be individual nodes. 
This allows strong updates when instantiating effects (Sec- 
tion EH!2J- 

7.1.3 Semantics of Procedure Effects 

We now give precise meaning to procedure effects. Our def- 
inition is slightly complicated by the desire to capture the 
set of nodes that are actually read in an execution while still 
allowing a certain amount of observational equivalence for 
write effects. 

The effects of procedure proc define a subset of per- 
missible program traces in the following way. Consider 
a concrete heap He with role assignment pc such that 
(He, pc) ao{Hic, P\c, K,c) with graph homomorphism ho from 
Definition 1291 Consider a trace T starting from a state with 
heap He and role assignment pc. Extract the subsequence 
of all loads and stores in trace T. Replace Load x=y.f by 
concrete read read where o^c is the concrete object refer- 
enced by X at the point of Load, and replace Store x.f=y by 
a concrete write o^-f = Oy where Ox is the object referenced 
by X and Oy object referenced by y at the point of Store. Let 
Pi, . . . ,Pk be the sequence of all concrete read statements 
and qi, . . . ,qk the sequence of all concrete write statements. 
We say that trace T starting at He conforms to the effects 
iff for all choices of ho the following conditions hold: 

1. ho{o) £ read(proc) for every pi of the form read o 

2. there exists a subsequence qi,^, . . . ,qit of gi, . . . , such 
that 

(a) executing , . . . , qi^ on He yields the same result 
as executing the entire sequence qi, . . . ,qk 

(b) the sequence g^^ , . . . , g^j implements write effects 
of procedure proc 

A typical way to obtain a sequence gij , . . . , g^j from the se- 
quence qi, . . . ,qk is to consider only the last write for each 
pair {oi, /) of object and field. 

We say that a sequence qi^, . . . ,qi^ implements write ef- 
fects mayWr(proc) and mustWri(proc) for 1 < i < io, 
io ~ mustWrNo if and only if there exists an injection 
s : {1, . . . , io} {ii , ■ ■ ■ , it} such that 

1. {h'{o), f,h'{o')) G mustWri(proc) for every concrete 
write gs(i) of the form o.f — o' , and 

2. {h'{o), f,h'{o')) G mayWr(proc) for all concrete writes 
qi of the form o.f = o' for i G {ii,...,it} \ 
{s(l),...,s(io)}. 

Here h'{n) — ho{n) for n £ nodes{He) where He is the initial 
concrete heap and h'{n) = NEW otherwise. 

It is possible (although not very common) for a single 
concrete heap He to have multiple homomorphisms ho to 
the initial context Hic- Note that in this case we require the 
trace T to conform to effects for all possible valid choices 
of ho- This places the burden of multiple choices of ho on 
procedure transfer relation verification f Section 17.2^ but in 
turn allows the context matching algorithm in Section f7.3.1l 
to select an arbitrary homomorphism between a caller's role 
graph and an initial context. 



7.2 Verifying Procedure Transfer Relations 

In this section we show how the analysis makes sure that a 
procedure conforms to its specification, expressed as an ini- 
tial context with a list of effects. To verify procedure effects, 
we extend the analysis representation from Section 16.11 A 
non-error role graph is now a tuple {H, p, K, r, E) where: 

1. r : nodes(iif) — » nodeso(-ffic) is initial context trans- 
formation that assigns an initial context node r(n) G 
nodes(iific) to every node n representing objects that 
existed prior to the procedure call, and assigns NEW to 
every node representing objects created during proce- 
dure activation; 

2. E (- UimustWri(proc) is a list of must write effects that 
procedure has performed so far. 

The initial context transformation r tracks how objects have 
moved since the beginning of procedure activation and is 
essential for verifying procedure effects which refer to initial 
context nodes. 

We represent the list E of performed must effects as a par- 
tial map from the set K,^^{i) x F to nodeso(-fi'ic). This allows 
the analysis to perform must effect folding by recording only 
the last must effect for every pair (n, /) of individual node 
n and field /. 

[entry.] = [{H,p,K, t,E) 

P : {proc} X {param^(proc)}i —* N,P C He 
Ho = {H\c \ {proc} X param(proc) x N) L) P 
rii = P(proc, paramj(proc)) 

Hi C Ho 

Hi\Ho<^ {{n', f, n") | {m, na} n {n,}, / 0} 
Vj : localCheck(nj , {H , p, K) , r)odes{Hi)) 

ni rt2 

H, \\H2\\ ■ ■■ \\H 
p = Pic 
K = A',c 

T = pic 
£ = 0} 

Figure 23: The Set of Role Graphs at Procedure Entry 



7.2.1 Role Graphs at Procedure Entry 

Our role analysis creates the set of role graphs at proce- 
dure entry point from the initial context context(proc). This 
is simple because role graphs and the initial context have 
similar abstraction relations (Sections I6.1l and l7.1^ . The dif- 
ference is that parameters in role graphs point to exactly one 
node, and parameter nodes are onstage nodes in role graphs 
which means that all their edges are "must" edges. 

Figure 1^21 shows the construction of the initial set of role 
graphs. First the graph Ho is created such that every pa- 
rameter paramj(proc) references exactly one parameter node 
rii. Next graph Hi is created by using localCheck to ensure 
that parameter nodes have the appropriate number of edges. 
Finally, the instantiation is performed on parameter nodes 
to ensure acyclicity constraints if the initial context does not 
make them explicit already. 
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Statement s 


Transition 


Constraints 


X = y.f 


{H 1+) {proc, X, n^}, p, K, t, E) ^ {H l±l {proc, x, n/}, p, K, t, E) 


(proc,y,na>, {ny,f,nf) G H 
T{nf) € read(proc) 


X = y.f 


{H l±l {proc, X, n^}, p, K, t, E) =^ ±g 


(proc,y,ny), {ny,f,nf) £ H 
T{nf) ^ read(proc) 


x.f = y 


{H a {n^, f, Uf}, p, K, T, E)^{H'S {n^J, n J, p, K, r, E) 


(proc, x,n^), (proc, y,n;,) eH 
{T{n^), f,T{ny)} e mayWr(proc) 


x.f = y 


{H W {n,,f, Uf}, p, K, T, E)^{Hi±) {n^J, Uy}, p, K, r, E') 


(proc, x,n:c), (proc, y,ny) eH 
{T{nx), f,T{ny)) e UimustWri(proc) 
E' = updateWr(£,(r(n.),/,r(n,))) 


x.f = y 


{H a {n, , f, uf}, p, K, T, E) ^ ±G 


(proc, x, n^c), (proc, y, riy) e H 
(t^Uj:), f,T{ny)) ^ mayWr(proc)U 
UimustWri(proc) 


X = new 


{H l+l {proc, X, Ux], p, K, T, E) =^ {H l±l {proc, x, n„}, p, K, r', E) 


Hn fresh 
r' = r[n„ ^ NEW] 



updateWr(£', {ni,f, n2)) = E[{ni,f) ^ n2] 



Figure 24: Verifying Load, Store, and New Statements 



7.2.2 Verifying Basic Statements 

To ensure that a procedure conforms to its transfer relation 
the analysis uses the initial context transformation r to as- 
sign every Load and Store statement to a declared effect. 
Figure 1^ shows new symbolic execution of Load, Store and 
New statements. 

The symbolic execution of Load statement x=y.f makes 
sure that the node being loaded is recorded in some read 
effect. If this is not the case, an error is reported. 

The symbolic execution of the Store statement x.f=y first 
retrieves nodes T{nx) and rijiy) in the initial role graph 
context that correspond to nodes and Uy in the current 
role graph. If the effect {rinx), f, Tijiy)) is declared as a may 
write effect the execution proceeds as usual. Otherwise, the 
effect is used to update the list E of must-write effects. The 
list E is checked at the end of procedure execution. 

The symbolic execution of the New statement updates the 
initial context transformation r assigning r(n„) = NEW for 
the new node n„. 

The r transformation is similarly updated during other 
abstract heap operations. Instantiation of node n' into node 
no assigns T(no) = T(n'), split copies values of t into the new 
set of isomorphic nodes, and normalization does not merge 
nodes ni and n2 if T(ni) / r(n2). 



7.2.3 Verifying Procedure Postconditions 

At the end of the procedure, the analysis verifies that pijii) = 
postR^(proc) where (proc, param^(proc), n^) £ H, and then 
performs node check on all onstage nodes using predicate 
nodeCheck(n, (//, p, J^), nodes(//)) for all n £ onstage(_H'). 

At the end of the procedure, the analysis also verifies 
that every performed effect in _E = {ei, . . . ,ek} can be at- 
tributed to exactly one declared must effect. This means 
that k — mustWrNo(proc) and there exists a permutation s 
of set {1, . . . ,k} such that ej,(i) G mustWri(proc) for all i. 



[proc'(2;i, . . .,Xp)]{g) = 

\f3G £ g : -.paramCheck(G) then {±g} 
else try Qi = matchContext(C/) 
if failed then {±g} 
else {G" I (G,/i> G Gi 

(addNEW(G), M> ^(G', p) G"} 

paramCheck{{H, p, K,T, E)) iff 

Vni : nodeCheck(ni, G, offstage(iy) U {ni}i) 
m are such that (proc, a;^, n^) G H 

addNE\N{{H,p,K,r,E)) = 
(-ffU {no} X F X {null}, 
p[no ^^ unknown], 
K[no s], 
T[no NEW], 
E) 

where no is fresh in H 

Figure 25: Procedure Call 
7.3 Analyzing Call Sites 

The set of role graphs at the procedure call site is up- 
dated based on the procedure transfer relation as follows. 
Consider procedure proc containing call site p G A'cFG(proc) 
with procedure call proc'(a;i, . . . , a;p). Let {H,c,Pic,K,c) = 
context(proc') be the initial context of the callee. 

Figure 1251 shows the transfer function for procedure call 
sites. It has the following phases: 

1. Parameter Check ensures that roles of parameters 
conform to the roles expected by the callee proc'. 

2. Context Matching (matchContext) ensures that the 
caller's role graphs represent a subset of concrete heaps 
represented by context(proc'). This is done by deriving 
a mapping p from the caller's role graph to nodes(//ic). 
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3. Effect Instantiation ( — > ) uses effects mayWr(proc') 
and mustWri(proc') in order to approximate all struc- 
tural changes to the role graph that proc' may perform. 

4. Role Reconstruction (-^) uses final roles for param- 
eter nodes and global role declarations postR^ (proc') to 
reconstruct roles of all nodes in the part of the role 
graph representing modified region of the heap. 

The parameter check requires nodeCheck(ni, G, offstage(iif)U 
{ni}i) for the parameter nodes rii. The other three phases 
are explained in more detail below. 

7.3.1 Context Matching 

Figure 1261 shows our context matching function. The 
matchContext function takes a set Q of role graphs and pro- 
duces a set of pairs (G, fi) where G = {H, p, K, t, E) is a role 
graph and is a homomorphism from H to H^c- The homo- 
morphism /x guarantees that a.~^{G) C ^ (context(proc')) 
since the homomorphism /iq from Definition 1291 can be con- 
structed from homomorphism h in Definition 1221 by putting 
ho — IJ.O h. This implies that it is legal to call proc' with any 
concrete graph represented by G. 

The algorithm in Figure I^H starts with empty maps p — 
nodes(G) x {_L} and extends /i until it is defined on all 
nodes(G) or there is no way to extend it further. It pro- 
ceeds by choosing a role graph {H, p, K, r, E) and node no 
for which the mapping /i is not defined yet. It then finds 
candidates in the initial context that no can be mapped to. 
The candidates are chosen to make sure that ^ remains a 
homomorphism. The accessibility requirement — that a pro- 
cedure may see no nodes with incorrect role — is enforced 
by making sure that nodes in inaccessible are never mapped 
into nodes in read for the callee. As long as this requirement 
holds, nodes in inaccessible can be mapped onto nodes of any 
role since their role need not be correct anyway. We gener- 
ally require that the set p~^{n'o) for individual node no in 
the initial context contain at most one node, and this node 
must be individual. In contrast, there might be many indi- 
vidual and summary nodes mapped onto a summary node. 
We relax this requirement by performing instantiation of a 
summary node of the caller if, at some point, that is the only 
way to extend the mapping (this corresponds to the first 
recursive call in the definition of match in Figure I^EJ- 

The algorithm is nondeterministic in the order in which 
nodes to be matched are selected. One possible ordering 
of nodes is depth-first order in the role graph starting from 
parameter nodes. If some nondeterministic branch does not 
succeed, the algorithm backtracks. The function fails if all 
branches fail. In that case the procedure call is considered 
illegal and _Lg is returned. The algorithm terminates since 
every procedure call lexicographically increases the sorted 
list of numbers |/i[nodes(7ii')] | for {{H, p, K,t, E) , y) £ P. 

7.3.2 Effect Instantiation 

The result of the matching algorithm is a set of pairs (G, /i) 
of role graphs and mappings. These pairs are used to instan- 
tiate procedure effects in each of the role graphs of the caller. 
Figure 1^ gives rules for effect instantiation. The analysis 
first verifies that the region read by the callee is included in 
the region read by the caller. Then it uses map /i to find 



the inverse image S of the performed effects. The effects in 
S are grouped by the source n and field /. Each field n.f 
is applied in sequence. There are three cases when applying 
an effect to n.f: 

1. There is only one node target of the write in nodes(//) 
and the effect is a must write effect. In this case we do 
a strong update. 

2. The condition in 1) is not satisfied, and the node n is 
offstage. In this case we conservatively add all relevant 
edges from S to H . 

3. The condition in 1) is not satisfied, but the node n is 
onstage i.e. it is a parameter node^. In this case there 
is no unique target for n.f, and we cannot add multi- 
ple edges either as this would violate the invariant for 
onstage nodes. We therefore do case analysis choosing 
which effect was performed last. If there are no must ef- 
fects that affect n, then we also consider the case where 
the original graph is unchanged. 

7.3.3 Role Reconstruction 

Procedure effects approximate structural changes to the 
heap, but do not provide information about role changes 
for non-parameter nodes. We use the role reconstruction 

algorithm in Figure 1271 to conservatively infer possible 
roles of nodes after the procedure call based on role changes 
for parameters and global role definitions. 

Role reconstruction first finds the set A^o of all nodes that 
might be accessed by the callee since these nodes might have 
their roles changed. Then it splits each node n £ A'o into \R\ 
different nodes p{n,r), one for each role r £ R. The node 
p{n, r) represents the subset of objects that were initially 
represented by n and have role r after procedure executes. 
The edges between nodes in the new graph are derived by 
simultaneously satisfying 1) structural constraints between 
nodes of the original graph; and 2) global role constraints 
from the role reference diagram. The nodes p(n, r) not con- 
nected to the parameter nodes are garbage collected in the 
role graph. In practice, we generate nodes p{n, r) and edges 
on demand starting from parameters making sure that they 
are reachable and satisfy both kinds of constraints. 

8 Extensions 

This section presents two extensions of the basic role system. 
The first extension allows statically unbounded number of 
aliases for objects. The second extension allows the analysis 
to verify more complex role changes. Additional ways of 
extending roles are given in |31|. 

8.1 Multislots 

A multislot {r' , f) G multislots(r) in the definition of role r 
allows any number of aliases {o',f,o) £ He for pc{o') — r' 
and pc{o) = r. We require multislots multislots(r) to be 
disjoint from all sloti(r). To handle multislots in role analysis 
we relax the condition 5) in Definition 1221 of the abstraction 

■^Non-parameter onstage nodes are never affected by ef- 
fects, as guaranteed by the matching algorithm. 
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matchContext(g) = match({(G', nodes(G) x \ G e G}) 

match : P(RoleGraphs x (A^ U {±})^) ^ P(RoleGraphs x iV'^) 

match (r) = 

To :={{G,^>Gr|M-'W/0}; 

if To = then return T; 

{{H, p, K,T, E) , jj.) :— choose To; 

T' = r\({H,p,K,T,E),^); 

paramnodes ~ {n | 3i : (proc,a;i,n) £ H}; 

inaccessible := onstage(_ff) \ paramnodes; 

no := choose 

candidates := {n' £ nodes(_ffic) ] 

(no ^ inaccessible and pic(n') — p(no)) or 
(no £ inaccessible and n' ^ read(proc'))} 

n [n'\{n',f,p{n))eH,c} 

{no J,n}eH 

n [n'\(p{n),f,n')eH,c}; 

{n,f,no)eH 

if candidates — then fail ; 

if candidates — {no}, -ft'(no) = s, _K'ic(no) = i, /i~^(no) = 
then match(r' U {(G", ^[ni nj,]) | {H, p, K, t, E) ^ G'}) 

no 

else no ~ choose {n' G candidates | K(n') — s or 

(A'(no) = i,M"'(n') = 0)} 
match(r' U ((i?, p, K, r, E),p[no i~* no])); 



Figure 26: The Context Matching Algorithm 



relation by allowing h to map more than one concrete edge 
(o', /, o) onto abstract edge (n', f,n)&H terminating at an 
onstage node n provided that {p{n'),f) G multislots(p(n)). 
The nodeCheck and expansion relation ^ are then extended 
appropriately. Note that a role graph does not represent 
the exact number of references that fill each multislot. The 
analysis therefore does not attempt to recognize actions that 
remove the last reference from the multislot. Once an object 
plays a role with a multislot, all subsequent roles that it plays 
must also have the multislot. 

8.2 Cascading Role Changes 

In some cases it is desirable to change roles of an entire set of 
offstage objects without bringing them onstage. We use the 
statement setRoleCascade(a;i : ri,...,Xn : r„) to perform 
such cascading role change of a set of nodes. The need for 
cascading role changes arises when roles encode reachability 
properties. 

Example 33 The code fragment in Figure manipulates 
object m of role Data. The role Data has fields buffer and 
work, each being a root for a singly linked acyclic list. El- 
ements of the first list have Buff erNode role and elements 
of the second list have WorkNode role. At some point proce- 
dure swaps the contents of the fields buffer and work, which 
requires all nodes in both lists to change the roles. These 
role changes are triggered by the setRoleCascade statement. 



role Buff erNode { 

fields next : Buff erNode I null; 



slots Buff erNode .next 
acyclic next; 



I Data. buffer; 



role WorkNode ■[ 

fields next : WorkNode I null; 

WorkNode . next I Data. work; 
acyclic next; 



role Data { 
fields buffer 
work 

} 



Buff erNode I null, 
WorkNode I null ; 



roleCheck(m : Data) ; 
X = m. buffer; 
y = m.work; 
m. buffer = y; 
m.work = x; 

setRoleCascade (x: WorkNode, y: Buff erNode) ; 



Figure 28: Example of a Cascading Role Change 
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{{H, p, K, T, £), M> ^{H', p\ K',r', E') 

{proc, Xi,ni) G H 
No = [read(proc')] 

s : A'o X R ^ N where s{n, r) are all different nodes fresh in H 
p' = p\{No X R)U {{s{n,r),r) \ n £ No,r e R} 

\({n,}i X R)U {(rii, postRj(proc))} 
K'{s{n,r)) = K{n) 
t' {s{n, r)) = r(n) 
E' = E 

Ho = H\ {{ni, /, n2> \ ni e No or n2 G A^o} 

U{(s(ni,ri),/,s(n2,r2)> | (ni,/,n2) eH,{ri,f,r2) G RRD} 
U{(ni,/,s(n2,r2)) | (ni,/,n2> G H, {p,c{p{ni)), f,r2) G RRD} 
U {(s(ni,ri),/,n2) | (ni,/,n2) G i^, (n, /, pic(M(n2))) G RRD} 

H' = GC(i/o) 

Figure 27: Call Site Role Reconstruction 



The statement indicates new roles for onstage nodes, and the 
analysis cascades role changes to offstage nodes. 

Given a role graph {H, p, K, E) cascading role change finds 
a new valid role assignment p' where the onstage nodes 
have desired roles and the roles of offstage nodes are ad- 
justed appropriately. Figure 1291 shows abstract execution 
of the setRoleCascade statement. Here neighbors(n, H) 
denotes nodes in H adjacent to n. The condition 
cascadingOk(n, i?, /9, A', p') makes sure it is legal to change 
the role of node n from p(n) to p (n) given that the neigh- 
bors of n also change role according to p' . This check resem- 
bles the check for setRole statement in Section 16.2.31 Let 
r = rho{n) and r' — p'{n). Then cascadingOk(n, //, p, if, p') 
requires the following conditions: 

1. {n,f,ni} G H implies p'(ni) G field/(r') 

2. slotno(r') — slotno(r) — k, and for every list 
(ni , /i, n), . . . , (rife, /fc, n) £ H if there is a permuta- 
tion p : {1, . . . ,k} {1, . . . , fe} such that {p{ni), fi) G 
slotp. (r), then there is a permutation p' : {1, . . . , fc} — » 
{1, . . . ,k} such that {p{ni), fi) G slotp^ (r'). 

3. identity relations were already satisfied or can be ex- 
plicitly checked: (/,<?) G identities(p'(n)) implies 

(a) {f,g} G identities(p(n)) or 

(b) for ah {n,f,n') G H: K{n) = i, and 
if {n',g, n") G H then n" = n 

4. either acyclic(p'(n)) C acyclic(p(7i)) or 
acycCheck(n, {H, p', if), ofFstage(J/)). 

In practice there may be zero or more solutions that satisfy 
constraints for a given cascading role change. Selecting any 
solution that satisfies the constraints is sound with respect 
to the original semantics. A useful heuristic for searching 
the solution space is to first explore branches with as few 
roles changed as possible. If no solutions are found, an error 
is reported. 



9 Related Work 

Typestate, as a type system extension for statically verifying 
dynamically changing properties, was proposed in |44l I43| . 
Aliasing causes problems for typestate-based systems be- 
cause the declared typestates of all aliases must change 
whenever the state of the referred object changes. Faced 
with the complexity of aliasing, I44| resorted to a more con- 
trolled language model which avoids aliasing. More recently 
proposed typestate approaches use linear types for heap ref- 
erences to support state changes of dynamic allocated ob- 
jects without addressing aliasing issues 

Motivated by the need to enforce safety properties in low- 
level software systems, |42l 1461 El use extensions of linear 
types to describe aliasing of objects and rely on language 
design to avoid non-local type inference. These systems take 
a construction based approach that specifies data structures 
as unfoldings of basic elaboration steps |46|. Similarly to 
shape types |15l I14| and graph types |29l I34| . this allows 
tree-like data structures to be expressed more precisely than 
using our roles, but cannot approximate data structures such 
as sparse matrices. More importantly, this approach makes 
it difficult to express nodes that are members of multiple 
data structures. Handling multiple data structures is the 
essential ingredient of our approach because the role of an 
object depends on data structures in which it participates. 

Like shape analysis techniques El OH EHI we have 
therefore adopted the constraint based approach which char- 
acterizes data structures in terms of the constraints that they 
satisfy. The constraint based approach allows us to handle a 
wider range of data structure while giving up some precision. 
Like |47II48| we perform non-local inference of program prop- 
erties, but while |47l 148 1 focus on linear integer constraints 
and handle recursive data structures conservatively, we do 
not handle integer arithmetic but have a more precise rep- 
resentation of the heap. At a higher level, these approaches 
all focus on detailed properties of individual data structures. 
We view our research as focusing more on global aspects such 
as the participation of objects in multiple data structures. 

The path matrix approaches |18l I17| have been used to 
implement efficient interprocedural analyses that infer one 
level of referencing relationships, but are not sufficiently pre- 
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{H,p,K,r,E)-i*{H,p',K,r,E) 
s — setRoleCascade(3::i : ri, . . . 



rii : {proc, Xi,ni) G H 
p'irii) = n 



p'{n) = p{n), n G onstage(_H') \ {n^ji 

No = {ne offstage(J/) | 3n' G neigh bors(n, _ff) : p(n') / p'(n')} 
Vn G A'o : cascadingOk(n, //, p, if, p') 



Figure 29: Abstract Execution for setRoleCascade 



cise to track must aliases of heap objects for programs with 
destructive updates of more complex data structures. 

The use of the instantiation relation in role analysis is 
analogous to the materialization operation of |39l I4UI . Role 
analysis can also track reachability properties, but we use an 
abstraction relation based on graph homomorphism rather 
than 3-valued logic. Our split operation achieves a similar 
goal to the focus operation of I40| . However, the generic 
focus algorithm of |32| cannot handle the reachability predi- 
cate which is needed for our split operation. This is because 
it conservatively refuses to focus on edges between two sum- 
mary nodes to avoid generating an infinite number of struc- 
tures. Rather than requiring definite values for reachability 
predicate, our role analysis splits by reachability properties 
in the abstract role graph, which illustrates the fiexibility 
of the homomorphism-based abstraction relation. Another 
difference with |40) is that our role analysis does not require 
the developer to supply the predicate update formulae for 
instrumentation predicates. 

A precise interprocedural analysis 1381 extends shape anal- 
ysis techniques to treat activation records as dynamically al- 
located structures. The approach also effectively synthesizes 
an application-specific set of contexts. Our approach differs 
in that it uses a less precise but more scalable treatment of 
procedures. It also uses a compositional approach that an- 
alyzes each procedure once to verify that it conforms to its 
specification. Like )48| our interprocedural analysis can ap- 
ply both may and must effects, but our contexts are general 
graphs with summary nodes and not trees. 

Roles are similar to the ADDS and ASAP data structure 
description languages |25l 1261 1^ . These systems use sound 
techniques to apply the data structure invariants for paral- 
lelization and general dependence testing but do not verify 
that the data structure invariants are preserved by destruc- 
tive updates of data structures 1241 . 

The object-oriented community has long been aware of 
benefits that dynamically changing classes give in large sys- 
tems |37| . Recognizing these benefits, researchers have pro- 
posed dynamic techniques that change the class of an object 
to reflect its state changes 1201 I13| . These systems 
illustrate the need for a static system that can verify the 
correct use of objects with changing roles. 

10 Conclusion 

This paper proposes two key ideas: aliasing relationships 
should determine, in large part, the state of each object, 
and the type system should use the resulting object states 
as its fundamental abstraction for describing procedure in- 
terfaces and object referencing relationships. We present a 
role system that realizes these two key ideas in a concrete 



system, and present an analysis algorithm that can verify 
that the program correctly respects the constraints of this 
role system. The result is that programmers can use roles 
for a variety of purposes: to ensure the correctness of ex- 
tended procedure interfaces that take the roles of parameters 
into account, to verify important data structure consistency 
properties, to express how procedures move objects between 
data structures, and to check that the program correctly im- 
plements correlated relationships between the states of mul- 
tiple objects. We therefore expect roles to improve the re- 
liability of the program and its transparency to developers 
and maintainers. 

Note. This November 2001 version of the technical report 
corrects some errors in the formal description of the analysis 
algorithm of the original technical report from July 2001. 
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Figure 30: Effect Instantiation 
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